Message ID | 20221121104403.1545f9b5@gandalf.local.home |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1670232wrr; Mon, 21 Nov 2022 07:53:36 -0800 (PST) X-Google-Smtp-Source: AA0mqf44NmYTgY8WkPRGZGYE7wm8K2+2PW2xAfjX63WYVOW2dI5qtXoHY1V1eQhVnmAt+xZaGiV9 X-Received: by 2002:a17:907:50a2:b0:7ad:a34f:1efe with SMTP id fv34-20020a17090750a200b007ada34f1efemr16087840ejc.350.1669046016001; Mon, 21 Nov 2022 07:53:36 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669046015; cv=none; d=google.com; s=arc-20160816; b=ksDu4goamq/OlnzMPLefhpFJWQ8lmOc8+pzEfx49Ol78xrn6AXQpx5W/aOh30X33FH krqZtn7ZBW+fHUrhx36+WXrrSBTUdquzEzL46QroNrc+sGZNwXxWvqOjaYqLFDcr0qHs ADmPZY7XsSF/ZZg3pvIyhZKnZXY5qCmF8QNM4egQaskjwzrbYLTtCmbkOp1OAEy/49YT WsevOyL8PajqzYqb9evqzKIQcmJ4qQnFy0xZR7St2onqlpJIM1evMguw8t7qYJ7mYpG0 pMCjASV18LD/D+u6kpeDywulHjwbTBc0KeErlfHI0QlkA7OowB80H9EGBouclErwH+Fc g94Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:subject:cc:to:from:date; bh=t/VbDmj2MOTi4wdMzZGg5B8szOYvx6l5tiyGnvKZCW8=; b=rcX7jN2bMtkV3v1/y33IEmkvPFCEpnYgUc2e0pnLCf3Xk/ZyRqaAqRtcnzTmd9w46e ZgeBFOh/IQegI8pkrCZGzmysSzSGWQRFAqA9UDt5iurt0976AkbCppG4ED2uOP4LlkhZ jwzXkPwemiq50P4nqMr+twtU94L8us04hsQziCxjxZWuFME30Xr4fkg6Lhy4KLDRju7n 2kX4+omeNP3cqTeT0ra7qkuOcOQOrmbr90By6KCMq/Hq8cOdXgBV6MXxg5gF2q9yKJfL mGTwEAZq2lYFAZLnprHMERZ7/gpE3Xb8qMVvTb5U2TyMmgQtom0QCfnJaor9JIshDj9c uIbA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w13-20020a170906130d00b007ae664aa7dbsi8429229ejb.877.2022.11.21.07.53.10; Mon, 21 Nov 2022 07:53:35 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231755AbiKUPpD (ORCPT <rfc822;cjcooper78@gmail.com> + 99 others); Mon, 21 Nov 2022 10:45:03 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45856 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232255AbiKUPo3 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 21 Nov 2022 10:44:29 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6D43DCD97D for <linux-kernel@vger.kernel.org>; Mon, 21 Nov 2022 07:44:07 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 0490C612CB for <linux-kernel@vger.kernel.org>; Mon, 21 Nov 2022 15:44:07 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id EAACDC433D6; Mon, 21 Nov 2022 15:44:04 +0000 (UTC) Date: Mon, 21 Nov 2022 10:44:03 -0500 From: Steven Rostedt <rostedt@goodmis.org> To: LKML <linux-kernel@vger.kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org>, Masami Hiramatsu <mhiramat@kernel.org>, Andrew Morton <akpm@linux-foundation.org>, Peter Zijlstra <peterz@infradead.org>, Kees Cook <keescook@chromium.org>, Josh Poimboeuf <jpoimboe@redhat.com>, KP Singh <kpsingh@kernel.org>, Chris Mason <clm@meta.com>, Mark Rutland <mark.rutland@arm.com>, Alexei Starovoitov <alexei.starovoitov@gmail.com>, Florent Revest <revest@chromium.org>, Greg Kroah-Hartman <gregkh@linuxfoundation.org>, Christoph Hellwig <hch@infradead.org> Subject: [PATCH] error-injection: Add prompt for function error injection Message-ID: <20221121104403.1545f9b5@gandalf.local.home> X-Mailer: Claws Mail 3.17.8 (GTK+ 2.24.33; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-6.7 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_HI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1750121595065146984?= X-GMAIL-MSGID: =?utf-8?q?1750121595065146984?= |
Series |
error-injection: Add prompt for function error injection
|
|
Commit Message
Steven Rostedt
Nov. 21, 2022, 3:44 p.m. UTC
From: "Steven Rostedt (Google)" <rostedt@goodmis.org> The config to be able to inject error codes into any function annotated with ALLOW_ERROR_INJECTION() is enabled when CONFIG_FUNCTION_ERROR_INJECTION is enabled. But unfortunately, this is always enabled on x86 when KPROBES is enabled, and there's no way to turn it off. As kprobes is useful for observability of the kernel, it is useful to have it enabled in production environments. But error injection should be avoided. Add a prompt to the config to allow it to be disabled even when kprobes is enabled, and get rid of the "def_bool y". This is a kernel debug feature (it's in Kconfig.debug), and should have never been something enabled by default. Cc: stable@vger.kernel.org Fixes: 540adea3809f6 ("error-injection: Separate error-injection from kprobe") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> --- lib/Kconfig.debug | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
Comments
On Mon, Nov 21, 2022 at 10:44:03AM -0500, Steven Rostedt wrote: > From: "Steven Rostedt (Google)" <rostedt@goodmis.org> > > The config to be able to inject error codes into any function annotated > with ALLOW_ERROR_INJECTION() is enabled when CONFIG_FUNCTION_ERROR_INJECTION > is enabled. But unfortunately, this is always enabled on x86 when KPROBES > is enabled, and there's no way to turn it off. > > As kprobes is useful for observability of the kernel, it is useful to have > it enabled in production environments. But error injection should be > avoided. Add a prompt to the config to allow it to be disabled even when > kprobes is enabled, and get rid of the "def_bool y". > > This is a kernel debug feature (it's in Kconfig.debug), and should have > never been something enabled by default. > > Cc: stable@vger.kernel.org > Fixes: 540adea3809f6 ("error-injection: Separate error-injection from kprobe") > Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> > --- > lib/Kconfig.debug | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) As stated on another thread, debugging production kernels where folks have been injecting errors into functions is not something anyone would like to and have to do. Especially if from looking at system dumps, it is not even clear what has been injected and why, rendering the system unstable and the issue probably unreproducible. Acked-by: Borislav Petkov <bp@suse.de> Thx.
On Mon, 21 Nov 2022 10:44:03 -0500 Steven Rostedt <rostedt@goodmis.org> wrote: > From: "Steven Rostedt (Google)" <rostedt@goodmis.org> > > The config to be able to inject error codes into any function annotated > with ALLOW_ERROR_INJECTION() is enabled when CONFIG_FUNCTION_ERROR_INJECTION > is enabled. But unfortunately, this is always enabled on x86 when KPROBES > is enabled, and there's no way to turn it off. > > As kprobes is useful for observability of the kernel, it is useful to have > it enabled in production environments. But error injection should be > avoided. Add a prompt to the config to allow it to be disabled even when > kprobes is enabled, and get rid of the "def_bool y". > > This is a kernel debug feature (it's in Kconfig.debug), and should have > never been something enabled by default. > Oops, thanks for update. Yes, it should not be enabled in the production system. Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Thanks, > Cc: stable@vger.kernel.org > Fixes: 540adea3809f6 ("error-injection: Separate error-injection from kprobe") > Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> > --- > lib/Kconfig.debug | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug > index c3c0b077ade3..9ee72d8860c3 100644 > --- a/lib/Kconfig.debug > +++ b/lib/Kconfig.debug > @@ -1874,8 +1874,14 @@ config NETDEV_NOTIFIER_ERROR_INJECT > If unsure, say N. > > config FUNCTION_ERROR_INJECTION > - def_bool y > + bool "Fault-injections of functions" > depends on HAVE_FUNCTION_ERROR_INJECTION && KPROBES > + help > + Add fault injections into various functions that are annotated with > + ALLOW_ERROR_INJECTION() in the kernel. BPF may also modify the return > + value of theses functions. This is useful to test error paths of code. > + > + If unsure, say N > > config FAULT_INJECTION > bool "Fault-injection framework" > -- > 2.35.1 >
On Mon, Nov 21, 2022 at 11:32 AM Borislav Petkov <bp@alien8.de> wrote: > > On Mon, Nov 21, 2022 at 10:44:03AM -0500, Steven Rostedt wrote: > > From: "Steven Rostedt (Google)" <rostedt@goodmis.org> > > > > The config to be able to inject error codes into any function annotated > > with ALLOW_ERROR_INJECTION() is enabled when CONFIG_FUNCTION_ERROR_INJECTION > > is enabled. But unfortunately, this is always enabled on x86 when KPROBES > > is enabled, and there's no way to turn it off. > > > > As kprobes is useful for observability of the kernel, it is useful to have > > it enabled in production environments. But error injection should be > > avoided. Add a prompt to the config to allow it to be disabled even when > > kprobes is enabled, and get rid of the "def_bool y". > > > > This is a kernel debug feature (it's in Kconfig.debug), and should have > > never been something enabled by default. > > > > Cc: stable@vger.kernel.org > > Fixes: 540adea3809f6 ("error-injection: Separate error-injection from kprobe") > > Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> > > --- > > lib/Kconfig.debug | 8 +++++++- > > 1 file changed, 7 insertions(+), 1 deletion(-) > > As stated on another thread, debugging production kernels where folks > have been injecting errors into functions is not something anyone would > like to and have to do. Especially if from looking at system dumps, it > is not even clear what has been injected and why, rendering the system > unstable and the issue probably unreproducible. > > Acked-by: Borislav Petkov <bp@suse.de> The commit log is bogus and the lack of understanding what bpf and error injection hooks are used for expressed in this thread is quite sad. Disabling error injection makes the system _less_ secure. But giving people an option to reduce security is a decision that every distro and data center has to make on their own.
On Mon, 21 Nov 2022 15:36:08 -0800 Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > On Mon, Nov 21, 2022 at 11:32 AM Borislav Petkov <bp@alien8.de> wrote: > > > > On Mon, Nov 21, 2022 at 10:44:03AM -0500, Steven Rostedt wrote: > > > From: "Steven Rostedt (Google)" <rostedt@goodmis.org> > > > > > > The config to be able to inject error codes into any function annotated > > > with ALLOW_ERROR_INJECTION() is enabled when CONFIG_FUNCTION_ERROR_INJECTION > > > is enabled. But unfortunately, this is always enabled on x86 when KPROBES > > > is enabled, and there's no way to turn it off. > > > > > > As kprobes is useful for observability of the kernel, it is useful to have > > > it enabled in production environments. But error injection should be > > > avoided. Add a prompt to the config to allow it to be disabled even when > > > kprobes is enabled, and get rid of the "def_bool y". > > > > > > This is a kernel debug feature (it's in Kconfig.debug), and should have > > > never been something enabled by default. > > > > > > Cc: stable@vger.kernel.org > > > Fixes: 540adea3809f6 ("error-injection: Separate error-injection from kprobe") > > > Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> > > > --- > > > lib/Kconfig.debug | 8 +++++++- > > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > As stated on another thread, debugging production kernels where folks > > have been injecting errors into functions is not something anyone would > > like to and have to do. Especially if from looking at system dumps, it > > is not even clear what has been injected and why, rendering the system > > unstable and the issue probably unreproducible. > > > > Acked-by: Borislav Petkov <bp@suse.de> > > The commit log is bogus and the lack of understanding what > bpf and error injection hooks are used for expressed > in this thread is quite sad. > Disabling error injection makes the system _less_ secure. Why? I thought this was only used for testing. Or, are you using this for changing the kernel behavior in production environment? For example, the commit 540adea3809f6 ("error-injection: Separate error-injection from kprobe") specifies that some btrfs functions to whitelist, which is I thought only for the testing btrfs. Now it seems more functions related to syscalls registered to the whitelist. (I didn't notice that...) If it is intended to filter syscalls, I recommend you to use secomp instead of this. > But giving people an option to reduce security is a decision > that every distro and data center has to make on their own. This function-level override should be used carefully just for testing linux kernel code. For forcibly filtering some functionality, it should use LSM or have another facility based on jump label. (yeah, if it is for security, why can you use LSM?) Thank you,
On Mon, 21 Nov 2022 15:36:08 -0800 Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > The commit log is bogus and the lack of understanding what > bpf and error injection hooks are used for expressed > in this thread is quite sad. > Disabling error injection makes the system _less_ secure. Please specify. As Masami replied, you are abusing this feature for some arcane way to do security. It's "error injection" how does enabling this improve security??? -- Steve > But giving people an option to reduce security is a decision > that every distro and data center has to make on their own.
On Mon, 21 Nov 2022 19:24:23 -0500 Steven Rostedt <rostedt@goodmis.org> wrote: > On Mon, 21 Nov 2022 15:36:08 -0800 > Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > > The commit log is bogus and the lack of understanding what > > bpf and error injection hooks are used for expressed > > in this thread is quite sad. > > Disabling error injection makes the system _less_ secure. > > Please specify. > > As Masami replied, you are abusing this feature for some arcane way to do > security. It's "error injection" how does enabling this improve security??? > If you want to add BPF programs to determine who or what can access various functions, then please work with the security folks and hook into their infrastructure. Please do not make some home grown operations on top of an interface that was not created for this purpose. That can only lead to unexpected consequences. -- Steve
On Mon, Nov 21, 2022 at 03:36:08PM -0800, Alexei Starovoitov wrote:
> The commit log is bogus and the lack of understanding what
You mean that:
Documentation/fault-injection/fault-injection.rst
?
I don't want any of that possible in production setups. And until you
give me a sane argument why it is good to have in production setups
generically, this is end of story.
On 11/22/22 5:39 AM, Borislav Petkov wrote: > On Mon, Nov 21, 2022 at 03:36:08PM -0800, Alexei Starovoitov wrote: >> The commit log is bogus and the lack of understanding what > > You mean that: > > Documentation/fault-injection/fault-injection.rst > > ? > > I don't want any of that possible in production setups. And until you > give me a sane argument why it is good to have in production setups > generically, this is end of story. > I think there are a few different sides to this: - it makes total sense that we all have wildly different ideas about which tools should be available in prod. Making this decision more fine grained seems reasonable. - fault injection for testing: we have a stage of qualification that does error injection against the prod kernel. It helps to have this against the debug kernel too, but that misses some races etc. I always just assumed distros and partners did some fault injection tests against the prod kernel builds? - fault injection for debugging: it doesn't happen often but at some point we run out of ideas and start making different functions fail in prod to figure out why we're not prodding. - overriding return values for security fixes: also not a common thing, but it's a tool we've used. There are usually better long term fixes, but it happens. Stepping back to the big picture of debugging systems with bpf in use, I love hearing (and telling) stories of debugging difficult problems. As far as I know, BPF telling lies hasn't really been a problem for us, so even though it's a huge tangent, if you have specific examples of problems you've seen, I'm really interested in hearing more. When I talk about production, both overall stability and validating new kernels, if I compare the BPF subsystem with MM, filesystems, cgroups, the scheduler, networking, and all things Jens, the systems BPF developers put in place are working really well for me. If I expand the discussion to the BPF programs themselves, there have been rare issues. Still completely on par with the rest of the kernel subsystems and within the noise in comparison with hardware failures. In other words, I really do care about the concerns you're expressing here, and I'm usually first in line to complain when random people make my job harder. I'm just not seeing these issues with BPF, and I see them actively trying to increase safety over time. -chris
On Tue, Nov 22, 2022 at 12:42:33PM -0500, Chris Mason wrote: > I think there are a few different sides to this: > > - it makes total sense that we all have wildly different ideas about > which tools should be available in prod. Making this decision more fine > grained seems reasonable. > > - fault injection for testing: we have a stage of qualification that > does error injection against the prod kernel. It helps to have this > against the debug kernel too, but that misses some races etc. I always > just assumed distros and partners did some fault injection tests against > the prod kernel builds? That's what the debug kernel flavor is for. At least on SLES. That's why we have the MCE injection module in the debug flavor and not in the production one. For the very same reason. > - overriding return values for security fixes: also not a common thing, > but it's a tool we've used. There are usually better long term fixes, > but it happens. Yeah, that's what live patching is for. > In other words, I really do care about the concerns you're expressing > here, and I'm usually first in line to complain when random people make > my job harder. I'm just not seeing these issues with BPF, and I see > them actively trying to increase safety over time. So this might be your opinion and I respect it but your first paragraph was spot on: to *have* the option to decide whether a company wants to support that in production or not. I'm sure it makes sense for you in your production scenarios but it doesn't for us. At least not at this point. And I think this should be disabled in our kernels for now. When the team decides someday that they wanna deal with bug reports of people doing fault injection, then sure by all means.
On Tue, 22 Nov 2022 12:42:33 -0500 Chris Mason <clm@meta.com> wrote: > On 11/22/22 5:39 AM, Borislav Petkov wrote: > > On Mon, Nov 21, 2022 at 03:36:08PM -0800, Alexei Starovoitov wrote: > >> The commit log is bogus and the lack of understanding what > > > > You mean that: > > > > Documentation/fault-injection/fault-injection.rst > > > > ? > > > > I don't want any of that possible in production setups. And until you > > give me a sane argument why it is good to have in production setups > > generically, this is end of story. > > > > I think there are a few different sides to this: > > - it makes total sense that we all have wildly different ideas about > which tools should be available in prod. Making this decision more fine > grained seems reasonable. > > - fault injection for testing: we have a stage of qualification that > does error injection against the prod kernel. It helps to have this > against the debug kernel too, but that misses some races etc. I always > just assumed distros and partners did some fault injection tests against > the prod kernel builds? > > - fault injection for debugging: it doesn't happen often but at some > point we run out of ideas and start making different functions fail in > prod to figure out why we're not prodding. As you have stated, we have different ideas for production. Your POV is cloud based (as is with other parts of my company). But my POV is Chromebooks where production means what's on a user's device. There's no reason to ever have fault injection enabled in such cases. I would assume that distributions are the same. But having kprobes for visibility can also be useful for debugging purposes, even in the field. > > - overriding return values for security fixes: also not a common thing, > but it's a tool we've used. There are usually better long term fixes, > but it happens. > > Stepping back to the big picture of debugging systems with bpf in use, I > love hearing (and telling) stories of debugging difficult problems. As > far as I know, BPF telling lies hasn't really been a problem for us, so > even though it's a huge tangent, if you have specific examples of > problems you've seen, I'm really interested in hearing more. > > When I talk about production, both overall stability and validating new > kernels, if I compare the BPF subsystem with MM, filesystems, cgroups, > the scheduler, networking, and all things Jens, the systems BPF > developers put in place are working really well for me. > > If I expand the discussion to the BPF programs themselves, there have > been rare issues. Still completely on par with the rest of the kernel > subsystems and within the noise in comparison with hardware failures. > > In other words, I really do care about the concerns you're expressing > here, and I'm usually first in line to complain when random people make > my job harder. I'm just not seeing these issues with BPF, and I see > them actively trying to increase safety over time. I'm sure you are not seeing theses issues with BPF, as the main developers and you have the same focus areas. I have no problem with the concept of BPF. My concern is mostly the development side of it. As you can basically attach functionality to arbitrary points in the kernel via BPF programs, the perception is that anything that is available is fair game. BPF tends to expand features beyond their intended usage. Heck, look at the name itself. "extended Berkeley Packet Filter", were eBPF has nothing to do with packet filtering anymore. Perhaps it should be renamed to CUST (Compiled Use Space Trampoline) ;-) Alexei said it's "sad" about my expression of BPF and error injection. If it has to do with security, then I would like to see more collaboration with the security folks and perhaps have BPF integrate with their infrastructure. But the usual response is "that's not fast enough for me" and then something is done from scratch without working with that subsystem to make it fast enough. Yes, it takes more time to collaborate than just doing it on your own. But that's the nature of an open source *community*. -- Steve
On 11/22/22 1:29 PM, Steven Rostedt wrote: > On Tue, 22 Nov 2022 12:42:33 -0500 > Chris Mason <clm@meta.com> wrote: > >> On 11/22/22 5:39 AM, Borislav Petkov wrote: >>> On Mon, Nov 21, 2022 at 03:36:08PM -0800, Alexei Starovoitov wrote: >>>> The commit log is bogus and the lack of understanding what >>> >>> You mean that: >>> >>> Documentation/fault-injection/fault-injection.rst >>> >>> ? >>> >>> I don't want any of that possible in production setups. And until you >>> give me a sane argument why it is good to have in production setups >>> generically, this is end of story. >>> >> >> I think there are a few different sides to this: >> >> - it makes total sense that we all have wildly different ideas about >> which tools should be available in prod. Making this decision more fine >> grained seems reasonable. >> >> - fault injection for testing: we have a stage of qualification that >> does error injection against the prod kernel. It helps to have this >> against the debug kernel too, but that misses some races etc. I always >> just assumed distros and partners did some fault injection tests against >> the prod kernel builds? >> >> - fault injection for debugging: it doesn't happen often but at some >> point we run out of ideas and start making different functions fail in >> prod to figure out why we're not prodding. > > As you have stated, we have different ideas for production. Your POV is > cloud based (as is with other parts of my company). But my POV is > Chromebooks where production means what's on a user's device. There's no > reason to ever have fault injection enabled in such cases. I would assume > that distributions are the same. But having kprobes for visibility can also > be useful for debugging purposes, even in the field. > Yeah, I definitely don't have opinions on the right way to build a chromebook, and replying to Boris, only slightly better at distros. Josef's original intent was this be easy to turn off. >> >> - overriding return values for security fixes: also not a common thing, >> but it's a tool we've used. There are usually better long term fixes, >> but it happens. >> >> Stepping back to the big picture of debugging systems with bpf in use, I >> love hearing (and telling) stories of debugging difficult problems. As >> far as I know, BPF telling lies hasn't really been a problem for us, so >> even though it's a huge tangent, if you have specific examples of >> problems you've seen, I'm really interested in hearing more. >> >> When I talk about production, both overall stability and validating new >> kernels, if I compare the BPF subsystem with MM, filesystems, cgroups, >> the scheduler, networking, and all things Jens, the systems BPF >> developers put in place are working really well for me. >> >> If I expand the discussion to the BPF programs themselves, there have >> been rare issues. Still completely on par with the rest of the kernel >> subsystems and within the noise in comparison with hardware failures. >> >> In other words, I really do care about the concerns you're expressing >> here, and I'm usually first in line to complain when random people make >> my job harder. I'm just not seeing these issues with BPF, and I see >> them actively trying to increase safety over time. > > I'm sure you are not seeing theses issues with BPF, as the main developers > and you have the same focus areas. > > I have no problem with the concept of BPF. My concern is mostly the > development side of it. As you can basically attach functionality to > arbitrary points in the kernel via BPF programs, the perception is that > anything that is available is fair game. BPF tends to expand features > beyond their intended usage. Heck, look at the name itself. "extended > Berkeley Packet Filter", were eBPF has nothing to do with packet filtering > anymore. Perhaps it should be renamed to CUST (Compiled Use Space > Trampoline) ;-) Developers in general tend to stretch interfaces a lot. At some point the friction of using the interface is worse than the friction of changing it, and things get redone. At the end of the day, BPF developers are still kernel developers and we end up with relatively sane feedback loops. > > Alexei said it's "sad" about my expression of BPF and error injection. If > it has to do with security, then I would like to see more collaboration > with the security folks and perhaps have BPF integrate with their > infrastructure. Now is a great time to grab KP and hear all about BPF LSM. > But the usual response is "that's not fast enough for me" > and then something is done from scratch without working with that > subsystem to make it fast enough. Yes, it takes more time to collaborate > than just doing it on your own. But that's the nature of an open source > *community*. One of the awkward and wonderful parts of our community is that none of us have the same goals or needs. Going back to the original thread, ARM has either one or two different live patching subsystems in use in the industry, and neither is upstream. One reason you end up having these arguments often with BPF is because they stick around and work with the community to upstream their work. The tradeoffs, compromises and decisions aren't always what you want, but we all show up every day and keep engaging. -chris
On Tue, 22 Nov 2022 14:51:08 -0500 Chris Mason <clm@meta.com> wrote: > On 11/22/22 1:29 PM, Steven Rostedt wrote: > > On Tue, 22 Nov 2022 12:42:33 -0500 > > Chris Mason <clm@meta.com> wrote: > > > >> On 11/22/22 5:39 AM, Borislav Petkov wrote: > >>> On Mon, Nov 21, 2022 at 03:36:08PM -0800, Alexei Starovoitov wrote: > >>>> The commit log is bogus and the lack of understanding what Why am I not understanding the controversy here? With this patch applied, people who want function error injection enable CONFIG_FUNCTION_ERROR_INJECTION and people who don't want it don't do that. Alexei, can you please suggest a less bogus changelog for this?
Hi, On Tue, 22 Nov 2022 12:42:33 -0500 Chris Mason <clm@meta.com> wrote: > > - fault injection for testing: we have a stage of qualification that > does error injection against the prod kernel. It helps to have this > against the debug kernel too, but that misses some races etc. I always > just assumed distros and partners did some fault injection tests against > the prod kernel builds? > > - fault injection for debugging: it doesn't happen often but at some > point we run out of ideas and start making different functions fail in > prod to figure out why we're not prodding. For those purpose, isn't it enough to add a taint flag for the fault injection? This will help us to identify that the kernel is possible to be under debug mode. > - overriding return values for security fixes: also not a common thing, > but it's a tool we've used. There are usually better long term fixes, > but it happens. I don't recommend to use the fault injection for this purpose. For fixing the security issue online, you should use livepatch. Thank you,
On Wed, Nov 30, 2022 at 2:37 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Tue, 22 Nov 2022 14:51:08 -0500 Chris Mason <clm@meta.com> wrote: > > > On 11/22/22 1:29 PM, Steven Rostedt wrote: > > > On Tue, 22 Nov 2022 12:42:33 -0500 > > > Chris Mason <clm@meta.com> wrote: > > > > > >> On 11/22/22 5:39 AM, Borislav Petkov wrote: > > >>> On Mon, Nov 21, 2022 at 03:36:08PM -0800, Alexei Starovoitov wrote: > > >>>> The commit log is bogus and the lack of understanding what > > Why am I not understanding the controversy here? With this patch > applied, people who want function error injection enable > CONFIG_FUNCTION_ERROR_INJECTION and people who don't want it don't do > that. > > Alexei, can you please suggest a less bogus changelog for this? People are using ALLOW_ERROR_INJECTION to allowlist kernel functions where bpf progs can attach. For example in the linux-next: git grep ALLOW_ERROR_INJECTION drivers/hid/bpf/hid_bpf_dispatch.c:ALLOW_ERROR_INJECTION(hid_bpf_device_event, ERRNO); drivers/hid/bpf/hid_bpf_dispatch.c:ALLOW_ERROR_INJECTION(hid_bpf_rdesc_fixup, ERRNO); drivers/hid/bpf/hid_bpf_jmp_table.c:ALLOW_ERROR_INJECTION(__hid_bpf_tail_call, ERRNO); The hid-bpf framework depends on it. iirc Benjamin mentioned that chromeos is one of the future users of hid-bpf. They need it to deal with a variety of quirks in hid devices in production. Either hid-bpf or bpf core can add "depends on FUNCTION_ERROR_INJECTION" to its kconfig. Like: diff --git a/kernel/bpf/Kconfig b/kernel/bpf/Kconfig index 2dfe1079f772..281e5263f3d1 100644 --- a/kernel/bpf/Kconfig +++ b/kernel/bpf/Kconfig @@ -32,6 +32,7 @@ config BPF_SYSCALL select BINARY_PRINTF select NET_SOCK_MSG if NET select PAGE_POOL if NET + depends on FUNCTION_ERROR_INJECTION default n but the better option for now would be to drop this patch. For the next next merge window we can come up with alternative way (instead of ALLOW_ERROR_INJECTION) to mark kernel functions purely on the bpf side. I don't think we have time to add this marking infrastructure for the upcoming merge window.
On Thu, Dec 1, 2022 at 5:59 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Wed, Nov 30, 2022 at 2:37 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > On Tue, 22 Nov 2022 14:51:08 -0500 Chris Mason <clm@meta.com> wrote: > > > > > On 11/22/22 1:29 PM, Steven Rostedt wrote: > > > > On Tue, 22 Nov 2022 12:42:33 -0500 > > > > Chris Mason <clm@meta.com> wrote: > > > > > > > >> On 11/22/22 5:39 AM, Borislav Petkov wrote: > > > >>> On Mon, Nov 21, 2022 at 03:36:08PM -0800, Alexei Starovoitov wrote: > > > >>>> The commit log is bogus and the lack of understanding what > > > > Why am I not understanding the controversy here? With this patch > > applied, people who want function error injection enable > > CONFIG_FUNCTION_ERROR_INJECTION and people who don't want it don't do > > that. > > > > Alexei, can you please suggest a less bogus changelog for this? > > People are using ALLOW_ERROR_INJECTION to allowlist kernel functions > where bpf progs can attach. > For example in the linux-next: > git grep ALLOW_ERROR_INJECTION > drivers/hid/bpf/hid_bpf_dispatch.c:ALLOW_ERROR_INJECTION(hid_bpf_device_event, > ERRNO); > drivers/hid/bpf/hid_bpf_dispatch.c:ALLOW_ERROR_INJECTION(hid_bpf_rdesc_fixup, > ERRNO); > drivers/hid/bpf/hid_bpf_jmp_table.c:ALLOW_ERROR_INJECTION(__hid_bpf_tail_call, > ERRNO); To be completely honest, I mark hid_bpf_device_event and hid_bpf_rdesc_fixup as ALLOW_ERROR_INJECTION only to have a dedicated function to create a SEC("fmod_ret"), but I am actually only using __hid_bpf_tail_call() to call the bpf programs. So technically, I should be able to not use ALLOW_ERROR_INJECTION but that would mean manually calling the BPF programs from __hid_bpf_tail_call() instead of relying on the magic of libbpf, but also would force me to have an other way of telling these 2 other functions are fmod_ret capable, which would be definitely not clean. > > The hid-bpf framework depends on it. > iirc Benjamin mentioned that chromeos is one of the future users > of hid-bpf. They need it to deal with a variety of quirks in hid > devices in production. > > Either hid-bpf or bpf core can add > "depends on FUNCTION_ERROR_INJECTION" > to its kconfig. > Like: > diff --git a/kernel/bpf/Kconfig b/kernel/bpf/Kconfig > index 2dfe1079f772..281e5263f3d1 100644 > --- a/kernel/bpf/Kconfig > +++ b/kernel/bpf/Kconfig > @@ -32,6 +32,7 @@ config BPF_SYSCALL > select BINARY_PRINTF > select NET_SOCK_MSG if NET > select PAGE_POOL if NET > + depends on FUNCTION_ERROR_INJECTION > default n FWIW, this is what I'm going to apply in hid.git for the time being [0]. But I'd rather have a BPF_HAVE_FMOD_RET as suggested in [1]. > > but the better option for now would be to drop this patch. > For the next next merge window we can come up with alternative way > (instead of ALLOW_ERROR_INJECTION) to mark kernel functions > purely on the bpf side. > I don't think we have time to add this marking infrastructure > for the upcoming merge window. > Outside of the "should we add ALLOW_ERROR_INJECTION in production environments", all I care about is that I want to be able to attach SEC("fmod_ret/...") on a specific set of functions that I control. So for me, I don't need to full "let's randomly add errors in any functions" (which I doubt you can do with ALLOW_ERROR_INJECTION anyway), I just need to be able to say "I want a bpf program to be able to change the return code of this dedicated function". So I agree with Alexei here. The situation has been to enable this parameter for quite some time without any complaints, and this patch prevents HID-BPF to be enabled on systems where sysadmins would think this is unsafe. Postponing this patch to the next merge window will give enough time for the BPF folks to change their implementation. Cheers, Benjamin [0] https://lore.kernel.org/linux-input/7df26319-f4ee-6dd1-a1b8-1caaf595528d@nvidia.com/T/#m9416ad54e2ef63244585c4ef83d07bebedf6e143 [1] https://lore.kernel.org/linux-input/CABRcYmKyRchQhabi1Vd9RcMQFCcb=EtWyEbFDFRTc-L-U8WhgA@mail.gmail.com/
On Thu, 1 Dec 2022 08:58:55 -0800 Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > but the better option for now would be to drop this patch. > For the next next merge window we can come up with alternative way > (instead of ALLOW_ERROR_INJECTION) to mark kernel functions > purely on the bpf side. > I don't think we have time to add this marking infrastructure > for the upcoming merge window. OK, thanks, dropped. Let's revisit this next cycle.
On Thu, Dec 1, 2022 at 8:59 AM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > The hid-bpf framework depends on it. Ok, this is completely unacceptably disgusting hack. That needs fixing. > Either hid-bpf or bpf core can add > "depends on FUNCTION_ERROR_INJECTION" No, it needs to be narrowed down a lot. Nobody sane wants error injection just because they want some random HID thing. And no, BPF shouldn't need it either. This needs to be narrowed down to the point where HID can say "I want *this* particular call to be able to be a bpf call. Stop this crazy "bpf / hid needs error injection" when that then implies a _lot_ more than that, plus is documented to be something entirely different anyway. I realize that HID has mis-used the "we could just use error injection here to instead insert random bpf code", but that's a complete hack. Plus it seems to happily not even have made it into mainline anyway, and only exists in linux-next. Let's head that disgusting hack off at the pass. I'm going to apply Steven's patch, because honestly, we need to fix this disgusting mess *before* it gets to mainline, rather than say "oh, we already have broken users in next, so let's bend over backwards for that". The code is called "error injection", not "random bpf extension" Linus
On Thu, 1 Dec 2022, Linus Torvalds wrote: > > The hid-bpf framework depends on it. > > Ok, this is completely unacceptably disgusting hack. > > That needs fixing. > > > Either hid-bpf or bpf core can add > > "depends on FUNCTION_ERROR_INJECTION" > > No, it needs to be narrowed down a lot. Nobody sane wants error > injection just because they want some random HID thing. > > And no, BPF shouldn't need it either. > > This needs to be narrowed down to the point where HID can say "I want > *this* particular call to be able to be a bpf call. > > Stop this crazy "bpf / hid needs error injection" when that then > implies a _lot_ more than that, plus is documented to be something > entirely different anyway. > > I realize that HID has mis-used the "we could just use error injection > here to instead insert random bpf code", but that's a complete hack. > > Plus it seems to happily not even have made it into mainline anyway, > and only exists in linux-next. Let's head that disgusting hack off at > the pass. > > I'm going to apply Steven's patch, because honestly, we need to fix > this disgusting mess *before* it gets to mainline, rather than say > "oh, we already have broken users in next, so let's bend over > backwards for that". > > The code is called "error injection", not "random bpf extension" Seems like quite a few parallel threads are currently going on about this, so it's a little bit hard to catch up for me as I am apparently CCed only on some of them. Anyway, I believe [1] that ERROR_INJECTION has been designed as a debugging feature in the first place, and should stay so. After figuring out now that HID-BPF actually has hard dependence on it, I fully agree [2] that the series should be ditched for 6.2 and will work with Benjamin to have it removed from current hid.git#for-next. [1] https://lore.kernel.org/all/nycvar.YEU.7.76.2211211716270.27249@gjva.wvxbf.pm/ [2] https://lore.kernel.org/lkml/nycvar.YFH.7.76.2212020135390.6045@cbobk.fhfr.pm/
On Thu, Dec 1, 2022 at 4:46 PM Jiri Kosina <jikos@kernel.org> wrote: > > Anyway, I believe [1] that ERROR_INJECTION has been designed as a > debugging feature in the first place, and should stay so. After figuring > out now that HID-BPF actually has hard dependence on it, I fully agree [2] > that the series should be ditched for 6.2 and will work with Benjamin to > have it removed from current hid.git#for-next. I do think that it is interesting to have a "let's have a bpf insertion hook here", so I'm not against the _concept_ of HID doing that. It's not so different from user-mode drivers, after all, which we also have. A kind of half-way state where we have a kernel driver, but one that may need custom site-specific (or machine-specific) tweaks. So I don't want to come across as being against having bpf used for tuning some HID issue (and I can imagine it making sense in other places that have machine-specific tweaks - I'm thinking of all the thermal probe or pincontrol mess where sometimes you have GPIO's or motherboard thermal sensors etc that are literally "user connected it to X"). But the notion that we'd use some error injection framework for it, and that you'd mix those concepts up - *that* I really think is just horrendous. Because even if you end up using some common infrastructure code, we really should separate things out much better. Linus
On Thu, 1 Dec 2022, Linus Torvalds wrote: > > Anyway, I believe [1] that ERROR_INJECTION has been designed as a > > debugging feature in the first place, and should stay so. After figuring > > out now that HID-BPF actually has hard dependence on it, I fully agree [2] > > that the series should be ditched for 6.2 and will work with Benjamin to > > have it removed from current hid.git#for-next. > > I do think that it is interesting to have a "let's have a bpf > insertion hook here", so I'm not against the _concept_ of HID doing > that. Absolutely, me neither, quite the contrary -- I am quite happy to see HID-BPF happening, because it'll actually make life easier for everybody: for people with quirky hardware (trivial testing of fixes), for kernel developers (trivial testing of fixes), and for distributions (trivial distribution of fixes). > It's not so different from user-mode drivers, after all, which we also > have. A kind of half-way state where we have a kernel driver, but one > that may need custom site-specific (or machine-specific) tweaks. Indeed. The whole rationale from Benjamin, explaining quite nicely why HID-BPF is a good thing, can be found in the very original, initial ancient cover letter: https://lore.kernel.org/lkml/20220224110828.2168231-1-benjamin.tissoires@redhat.com/ > So I don't want to come across as being against having bpf used for > tuning some HID issue (and I can imagine it making sense in other places > that have machine-specific tweaks - I'm thinking of all the thermal > probe or pincontrol mess where sometimes you have GPIO's or motherboard > thermal sensors etc that are literally "user connected it to X"). > > But the notion that we'd use some error injection framework for it, > and that you'd mix those concepts up - *that* I really think is just > horrendous. Fully agreed. I unfortunately missed that particular aspect during review, and it popped up only after HID-BPF appeared in linux-next.
On Fri, 2 Dec 2022 02:03:03 +0100 (CET) Jiri Kosina <jikos@kernel.org> wrote: > On Thu, 1 Dec 2022, Linus Torvalds wrote: > > > > Anyway, I believe [1] that ERROR_INJECTION has been designed as a > > > debugging feature in the first place, and should stay so. After figuring > > > out now that HID-BPF actually has hard dependence on it, I fully agree [2] > > > that the series should be ditched for 6.2 and will work with Benjamin to > > > have it removed from current hid.git#for-next. > > > > I do think that it is interesting to have a "let's have a bpf > > insertion hook here", so I'm not against the _concept_ of HID doing > > that. > > Absolutely, me neither, quite the contrary -- I am quite happy to see > HID-BPF happening, because it'll actually make life easier for everybody: > for people with quirky hardware (trivial testing of fixes), for kernel > developers (trivial testing of fixes), and for distributions (trivial > distribution of fixes). Full disclosure, I'm not against a bpf_hook either. In fact, I think I even stated something to that effect, like adding a bpf_hook annotation to functions or whatever, so that people can plainly see that the function can have bpf attached to it. I just *hate* the ad hoc way of using infrastructure for other purposes than what they were designed for. -- Steve
On Thu, Dec 01, 2022 at 01:13:35PM -0800, Linus Torvalds wrote: > > I'm going to apply Steven's patch, because honestly, we need to fix > this disgusting mess *before* it gets to mainline, rather than say > "oh, we already have broken users in next, so let's bend over > backwards for that". > > The code is called "error injection", not "random bpf extension" Right, ALLOW_ERROR_INJECTION doesn't fit hid-bpf. As Benjamin clarified for hid-bpf we don't want non-bpf attach to those 3 functions. Injecting errors there is not useful. We'll come with some clean mechanism to express "attach bpf here". But let's examine other places of "error injection". The following are clearly falling into kernel debugging category: mm/filemap.c:ALLOW_ERROR_INJECTION(__filemap_add_folio, ERRNO); mm/page_alloc.c:ALLOW_ERROR_INJECTION(should_fail_alloc_page, TRUE); mm/slab_common.c:ALLOW_ERROR_INJECTION(should_failslab, ERRNO); The distros might disable such hooks while data centers might still enable them. Consider chaosmonkey and other frameworks that stress user space. They are used on non-production user space while running on production kernel. The cloud providers wouldn't want to spin another kernel flavor with fault injection enabled just to satisfy this set of users. So the FUNCTION_ERROR_INJECTION kconfig is absolutely necessary. Whether it defaults to N or Y, doesn't matter much. But where would you draw the line for: include/linux/syscalls.h: ALLOW_ERROR_INJECTION(sys_##sname, ERRNO); Right now people can add to their .bashrc: cd /sys/kernel/debug/fail_function/ echo __x64_sys_bpf > inject echo 0xffffFFFF > times echo 100 > probability echo 0 > verbose and stop their favorite syscall ever happening on their laptops after boot. The fault injection framework disables individual syscall with zero performance overhead comparing to LSM and seccomp mechanisms. BPF is not involved here. It's a kprobe in one spot. All other syscalls don't notice it. It's an attractive way to improve security. A BPF prog over syscall can filter by user, cgroup, task and give fine grain control over security surface. tbh I'm not aware of folks doing "syscall disabling" through command line like above (I've only seen it through bpf), but it doesn't mean that somebody will not start complaining that their script broke, because distro disabled fault injection. So should we split FUNCTION_ERROR_INJECTION kconfig into two ? And do default N for things like should_failslab() and default Y for syscalls? In the other thread TAINT_FAULT_INJECTED was proposed. I think it's fine to taint when injecting errors into should_failslab(), but tainting when syscall is disabled feels wrong. One can argue that ALLOW_ERROR_INJECTION(sys_##sname, ERRNO); was an abuse of fault injection, but let's keep it aside and focus on moving forward from here.
On 12/1/22 22:13, Linus Torvalds wrote: > On Thu, Dec 1, 2022 at 8:59 AM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: >> >> The hid-bpf framework depends on it. > > Ok, this is completely unacceptably disgusting hack. [foreword: I have read the other replies, just replying to this one because it is the explicit ask for a fix] > > That needs fixing. > >> Either hid-bpf or bpf core can add >> "depends on FUNCTION_ERROR_INJECTION" > > No, it needs to be narrowed down a lot. Nobody sane wants error > injection just because they want some random HID thing. > > And no, BPF shouldn't need it either. > > This needs to be narrowed down to the point where HID can say "I want > *this* particular call to be able to be a bpf call. So, would the following be OK? I still have a few concerns I'll explain after the patch. --- From 1290561304eb3e48e1e6d727def13c16698a26f1 Mon Sep 17 00:00:00 2001 From: Benjamin Tissoires <benjamin.tissoires@redhat.com> Date: Fri, 2 Dec 2022 12:40:29 +0100 Subject: [PATCH] bpf: do not rely on ALLOW_ERROR_INJECTION for fmod_ret The current way of expressing that a non-bpf kernel component is willing to accept that bpf programs can be attached to it and that they can change the return value is to abuse ALLOW_ERROR_INJECTION. This is debated in the link below, and the result is that it is not a reasonable thing to do. An easy fix is to keep the list of valid functions in the BPF verifier in the same way we keep the non-sleepable ALLOW_ERROR_INJECTION ones. However, this kind of defeat the point of being able to add bpf APIs in non-BPF subsystems, so we probably need to rethink that part. Link: https://lore.kernel.org/all/20221121104403.1545f9b5@gandalf.local.home/ Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> --- drivers/hid/bpf/hid_bpf_dispatch.c | 2 -- drivers/hid/bpf/hid_bpf_jmp_table.c | 1 - kernel/bpf/verifier.c | 20 +++++++++++++++++++- 3 files changed, 19 insertions(+), 4 deletions(-) diff --git a/drivers/hid/bpf/hid_bpf_dispatch.c b/drivers/hid/bpf/hid_bpf_dispatch.c index 3c989e74d249..d1f6a1d4ae60 100644 --- a/drivers/hid/bpf/hid_bpf_dispatch.c +++ b/drivers/hid/bpf/hid_bpf_dispatch.c @@ -44,7 +44,6 @@ __weak noinline int hid_bpf_device_event(struct hid_bpf_ctx *ctx) { return 0; } -ALLOW_ERROR_INJECTION(hid_bpf_device_event, ERRNO); u8 * dispatch_hid_bpf_device_event(struct hid_device *hdev, enum hid_report_type type, u8 *data, @@ -105,7 +104,6 @@ __weak noinline int hid_bpf_rdesc_fixup(struct hid_bpf_ctx *ctx) { return 0; } -ALLOW_ERROR_INJECTION(hid_bpf_rdesc_fixup, ERRNO); u8 *call_hid_bpf_rdesc_fixup(struct hid_device *hdev, u8 *rdesc, unsigned int *size) { diff --git a/drivers/hid/bpf/hid_bpf_jmp_table.c b/drivers/hid/bpf/hid_bpf_jmp_table.c index 579a6c06906e..207972b028d9 100644 --- a/drivers/hid/bpf/hid_bpf_jmp_table.c +++ b/drivers/hid/bpf/hid_bpf_jmp_table.c @@ -103,7 +103,6 @@ __weak noinline int __hid_bpf_tail_call(struct hid_bpf_ctx *ctx) { return 0; } -ALLOW_ERROR_INJECTION(__hid_bpf_tail_call, ERRNO); int hid_bpf_prog_run(struct hid_device *hdev, enum hid_bpf_prog_type type, struct hid_bpf_ctx_kern *ctx_kern) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 225666307bba..4eac440d659f 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -24,6 +24,7 @@ #include <linux/bpf_lsm.h> #include <linux/btf_ids.h> #include <linux/poison.h> +#include <linux/hid_bpf.h> #include "disasm.h" @@ -14827,6 +14828,20 @@ static int check_non_sleepable_error_inject(u32 btf_id) return btf_id_set_contains(&btf_non_sleepable_error_inject, btf_id); } +/* Manually tag fmod_ret functions to not misuse ALLOW_ERROR_INJECTION */ +BTF_SET_START(btf_modify_return) +#if CONFIG_HID_BPF +BTF_ID(func, hid_bpf_device_event) +BTF_ID(func, hid_bpf_rdesc_fixup) +BTF_ID(func, __hid_bpf_tail_call) +#endif /* CONFIG_HID_BPF */ +BTF_SET_END(btf_modify_return) + +static int check_function_modify_return(u32 btf_id) +{ + return btf_id_set_contains(&btf_modify_return, btf_id); +} + int bpf_check_attach_target(struct bpf_verifier_log *log, const struct bpf_prog *prog, const struct bpf_prog *tgt_prog, @@ -15047,7 +15062,10 @@ int bpf_check_attach_target(struct bpf_verifier_log *log, bpf_log(log, "can't modify return codes of BPF programs\n"); return -EINVAL; } - ret = check_attach_modify_return(addr, tname); + ret = -EINVAL; + if (!check_function_modify_return(btf_id) || + check_attach_modify_return(addr, tname)) + ret = 0; if (ret) { bpf_log(log, "%s() is not modifiable\n", tname); return ret;
On Thu, Dec 01, 2022 at 05:41:29PM -0800, Alexei Starovoitov wrote: > > The fault injection framework disables individual syscall with zero performance > overhead comparing to LSM and seccomp mechanisms. > BPF is not involved here. It's a kprobe in one spot. > All other syscalls don't notice it. > It's an attractive way to improve security. > > A BPF prog over syscall can filter by user, cgroup, task and give fine grain > control over security surface. > tbh I'm not aware of folks doing "syscall disabling" through command line like > above (I've only seen it through bpf), but it doesn't mean that somebody will > not start complaining that their script broke, because distro disabled fault > injection. > > So should we split FUNCTION_ERROR_INJECTION kconfig into two ? > And do default N for things like should_failslab() and > default Y for syscalls? How about calling the latter something like bpf syscall hooks, and not using the terminology "error injection" in relation to system calls? I think that might be less confusing. - Ted
On Fri, Dec 02, 2022 at 03:55:38PM +0100, Benjamin Tissoires wrote: > > > On 12/1/22 22:13, Linus Torvalds wrote: > > On Thu, Dec 1, 2022 at 8:59 AM Alexei Starovoitov > > <alexei.starovoitov@gmail.com> wrote: > > > > > > The hid-bpf framework depends on it. > > > > Ok, this is completely unacceptably disgusting hack. > > [foreword: I have read the other replies, just replying to this one > because it is the explicit ask for a fix] > > > > > That needs fixing. > > > > > Either hid-bpf or bpf core can add > > > "depends on FUNCTION_ERROR_INJECTION" > > > > No, it needs to be narrowed down a lot. Nobody sane wants error > > injection just because they want some random HID thing. > > > > And no, BPF shouldn't need it either. > > > > This needs to be narrowed down to the point where HID can say "I want > > *this* particular call to be able to be a bpf call. > > So, would the following be OK? I still have a few concerns I'll explain > after the patch. > > --- > From 1290561304eb3e48e1e6d727def13c16698a26f1 Mon Sep 17 00:00:00 2001 > From: Benjamin Tissoires <benjamin.tissoires@redhat.com> > Date: Fri, 2 Dec 2022 12:40:29 +0100 > Subject: [PATCH] bpf: do not rely on ALLOW_ERROR_INJECTION for fmod_ret > > The current way of expressing that a non-bpf kernel component is willing > to accept that bpf programs can be attached to it and that they can change > the return value is to abuse ALLOW_ERROR_INJECTION. > This is debated in the link below, and the result is that it is not a > reasonable thing to do. > > An easy fix is to keep the list of valid functions in the BPF verifier > in the same way we keep the non-sleepable ALLOW_ERROR_INJECTION ones. > However, this kind of defeat the point of being able to add bpf APIs in > non-BPF subsystems, so we probably need to rethink that part. > > > Link: https://lore.kernel.org/all/20221121104403.1545f9b5@gandalf.local.home/ > Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> > Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> > --- > drivers/hid/bpf/hid_bpf_dispatch.c | 2 -- > drivers/hid/bpf/hid_bpf_jmp_table.c | 1 - > kernel/bpf/verifier.c | 20 +++++++++++++++++++- > 3 files changed, 19 insertions(+), 4 deletions(-) > > diff --git a/drivers/hid/bpf/hid_bpf_dispatch.c b/drivers/hid/bpf/hid_bpf_dispatch.c > index 3c989e74d249..d1f6a1d4ae60 100644 > --- a/drivers/hid/bpf/hid_bpf_dispatch.c > +++ b/drivers/hid/bpf/hid_bpf_dispatch.c > @@ -44,7 +44,6 @@ __weak noinline int hid_bpf_device_event(struct hid_bpf_ctx *ctx) > { > return 0; > } > -ALLOW_ERROR_INJECTION(hid_bpf_device_event, ERRNO); > u8 * > dispatch_hid_bpf_device_event(struct hid_device *hdev, enum hid_report_type type, u8 *data, > @@ -105,7 +104,6 @@ __weak noinline int hid_bpf_rdesc_fixup(struct hid_bpf_ctx *ctx) > { > return 0; > } > -ALLOW_ERROR_INJECTION(hid_bpf_rdesc_fixup, ERRNO); > u8 *call_hid_bpf_rdesc_fixup(struct hid_device *hdev, u8 *rdesc, unsigned int *size) > { > diff --git a/drivers/hid/bpf/hid_bpf_jmp_table.c b/drivers/hid/bpf/hid_bpf_jmp_table.c > index 579a6c06906e..207972b028d9 100644 > --- a/drivers/hid/bpf/hid_bpf_jmp_table.c > +++ b/drivers/hid/bpf/hid_bpf_jmp_table.c > @@ -103,7 +103,6 @@ __weak noinline int __hid_bpf_tail_call(struct hid_bpf_ctx *ctx) > { > return 0; > } > -ALLOW_ERROR_INJECTION(__hid_bpf_tail_call, ERRNO); > int hid_bpf_prog_run(struct hid_device *hdev, enum hid_bpf_prog_type type, > struct hid_bpf_ctx_kern *ctx_kern) > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index 225666307bba..4eac440d659f 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -24,6 +24,7 @@ > #include <linux/bpf_lsm.h> > #include <linux/btf_ids.h> > #include <linux/poison.h> > +#include <linux/hid_bpf.h> > #include "disasm.h" > @@ -14827,6 +14828,20 @@ static int check_non_sleepable_error_inject(u32 btf_id) > return btf_id_set_contains(&btf_non_sleepable_error_inject, btf_id); > } > +/* Manually tag fmod_ret functions to not misuse ALLOW_ERROR_INJECTION */ > +BTF_SET_START(btf_modify_return) > +#if CONFIG_HID_BPF > +BTF_ID(func, hid_bpf_device_event) > +BTF_ID(func, hid_bpf_rdesc_fixup) > +BTF_ID(func, __hid_bpf_tail_call) > +#endif /* CONFIG_HID_BPF */ > +BTF_SET_END(btf_modify_return) > + > +static int check_function_modify_return(u32 btf_id) > +{ > + return btf_id_set_contains(&btf_modify_return, btf_id); > +} > + > int bpf_check_attach_target(struct bpf_verifier_log *log, > const struct bpf_prog *prog, > const struct bpf_prog *tgt_prog, > @@ -15047,7 +15062,10 @@ int bpf_check_attach_target(struct bpf_verifier_log *log, > bpf_log(log, "can't modify return codes of BPF programs\n"); > return -EINVAL; > } > - ret = check_attach_modify_return(addr, tname); > + ret = -EINVAL; > + if (!check_function_modify_return(btf_id) || > + check_attach_modify_return(addr, tname)) > + ret = 0; > if (ret) { > bpf_log(log, "%s() is not modifiable\n", tname); > return ret; > -- > 2.38.1 > --- > > While this patch removes the need for ALLOW_ERROR_INJECTION it has a > couple of drawbacks: > - suddenly we lose the nice separation of concerns between bpf core and > its users (HID in my case) > - it would need to be changed in 6.3 simply because of the previous > point, so it is just a temporary fix. Agree, but it works short term. A silver lining is BTF_SET approach consumes 4 bytes per mark while ALLOW_ERROR_INJECTION consumes 16 bytes for struct error_injection_entry and then another 48 bytes per mark for struct ei_entry. An alternative would be to define a known prefix like "bpf_modret_" or "bpf_hook_" and rename these three functions. Then there will be no need for #include <linux/hid_bpf.h> in bpf core. > So I am not sure if this would qualify HID-BPF for 6.2. Please speak up. Since that was the only thing I think it's fine to stay in the queue.
On Fri, Dec 02, 2022 at 10:56:52AM -0500, Theodore Ts'o wrote: > On Thu, Dec 01, 2022 at 05:41:29PM -0800, Alexei Starovoitov wrote: > > > > The fault injection framework disables individual syscall with zero performance > > overhead comparing to LSM and seccomp mechanisms. > > BPF is not involved here. It's a kprobe in one spot. > > All other syscalls don't notice it. > > It's an attractive way to improve security. > > > > A BPF prog over syscall can filter by user, cgroup, task and give fine grain > > control over security surface. > > tbh I'm not aware of folks doing "syscall disabling" through command line like > > above (I've only seen it through bpf), but it doesn't mean that somebody will > > not start complaining that their script broke, because distro disabled fault > > injection. > > > > So should we split FUNCTION_ERROR_INJECTION kconfig into two ? > > And do default N for things like should_failslab() and > > default Y for syscalls? > > How about calling the latter something like bpf syscall hooks, and not > using the terminology "error injection" in relation to system calls? > I think that might be less confusing. I think 'syscall error injection' name fits well. It's a generic feature that both kprobes and bpf should be able to use. Here is the patch... Even with this patch we have 7 failures in BPF selftests. We will fix them later with the same mechanism as we will pick for hid-bpf. This patch will keep 'syscall disabling' scripts working and bpf syscall adjustment will work too. So no chance of breaking anyone. While actual error injection inside the kernel will be disabled. Better name suggestions are welcome, of course. From 2960958f91d1134b1a8f27787875f6b9300f205e Mon Sep 17 00:00:00 2001 From: Alexei Starovoitov <ast@kernel.org> Date: Fri, 2 Dec 2022 13:06:08 -0800 Subject: [PATCH] error-injection: Split FUNCTION_ERROR_INJECTION into syscalls and the rest. Split FUNCTION_ERROR_INJECTION into: - SYSCALL_ERROR_INJECTION with default y - FUNC_ERROR_INJECTION with default n. The former is only used to modify return values of syscalls for security and user space testing reasons while the latter is for the rest of error injection in the kernel that should only be used to stress test and debug the kernel. Signed-off-by: Alexei Starovoitov <ast@kernel.org> --- arch/arm64/include/asm/syscall_wrapper.h | 8 ++++---- arch/powerpc/include/asm/syscall_wrapper.h | 4 ++-- arch/s390/include/asm/syscall_wrapper.h | 12 ++++++------ arch/x86/include/asm/syscall_wrapper.h | 4 ++-- include/asm-generic/error-injection.h | 1 + include/linux/compat.h | 4 ++-- include/linux/syscalls.h | 4 ++-- kernel/fail_function.c | 1 + lib/Kconfig.debug | 15 +++++++++++++++ lib/error-inject.c | 6 ++++++ 10 files changed, 41 insertions(+), 18 deletions(-) diff --git a/arch/arm64/include/asm/syscall_wrapper.h b/arch/arm64/include/asm/syscall_wrapper.h index d30217c21eff..2c5ca239e88c 100644 --- a/arch/arm64/include/asm/syscall_wrapper.h +++ b/arch/arm64/include/asm/syscall_wrapper.h @@ -19,7 +19,7 @@ #define COMPAT_SYSCALL_DEFINEx(x, name, ...) \ asmlinkage long __arm64_compat_sys##name(const struct pt_regs *regs); \ - ALLOW_ERROR_INJECTION(__arm64_compat_sys##name, ERRNO); \ + ALLOW_ERROR_INJECTION(__arm64_compat_sys##name, SYSCALL); \ static long __se_compat_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)); \ static inline long __do_compat_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)); \ asmlinkage long __arm64_compat_sys##name(const struct pt_regs *regs) \ @@ -34,7 +34,7 @@ #define COMPAT_SYSCALL_DEFINE0(sname) \ asmlinkage long __arm64_compat_sys_##sname(const struct pt_regs *__unused); \ - ALLOW_ERROR_INJECTION(__arm64_compat_sys_##sname, ERRNO); \ + ALLOW_ERROR_INJECTION(__arm64_compat_sys_##sname, SYSCALL); \ asmlinkage long __arm64_compat_sys_##sname(const struct pt_regs *__unused) #define COND_SYSCALL_COMPAT(name) \ @@ -50,7 +50,7 @@ #define __SYSCALL_DEFINEx(x, name, ...) \ asmlinkage long __arm64_sys##name(const struct pt_regs *regs); \ - ALLOW_ERROR_INJECTION(__arm64_sys##name, ERRNO); \ + ALLOW_ERROR_INJECTION(__arm64_sys##name, SYSCALL); \ static long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)); \ static inline long __do_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)); \ asmlinkage long __arm64_sys##name(const struct pt_regs *regs) \ @@ -69,7 +69,7 @@ #define SYSCALL_DEFINE0(sname) \ SYSCALL_METADATA(_##sname, 0); \ asmlinkage long __arm64_sys_##sname(const struct pt_regs *__unused); \ - ALLOW_ERROR_INJECTION(__arm64_sys_##sname, ERRNO); \ + ALLOW_ERROR_INJECTION(__arm64_sys_##sname, SYSCALL); \ asmlinkage long __arm64_sys_##sname(const struct pt_regs *__unused) #define COND_SYSCALL(name) \ diff --git a/arch/powerpc/include/asm/syscall_wrapper.h b/arch/powerpc/include/asm/syscall_wrapper.h index 67486c67e8a2..ce1148809c6b 100644 --- a/arch/powerpc/include/asm/syscall_wrapper.h +++ b/arch/powerpc/include/asm/syscall_wrapper.h @@ -17,7 +17,7 @@ struct pt_regs; #define __SYSCALL_DEFINEx(x, name, ...) \ long sys##name(const struct pt_regs *regs); \ - ALLOW_ERROR_INJECTION(sys##name, ERRNO); \ + ALLOW_ERROR_INJECTION(sys##name, SYSCALL); \ static long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)); \ static inline long __do_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)); \ long sys##name(const struct pt_regs *regs) \ @@ -36,7 +36,7 @@ struct pt_regs; #define SYSCALL_DEFINE0(sname) \ SYSCALL_METADATA(_##sname, 0); \ long sys_##sname(const struct pt_regs *__unused); \ - ALLOW_ERROR_INJECTION(sys_##sname, ERRNO); \ + ALLOW_ERROR_INJECTION(sys_##sname, SYSCALL); \ long sys_##sname(const struct pt_regs *__unused) #define COND_SYSCALL(name) \ diff --git a/arch/s390/include/asm/syscall_wrapper.h b/arch/s390/include/asm/syscall_wrapper.h index fde7e6b1df48..a253def48cbe 100644 --- a/arch/s390/include/asm/syscall_wrapper.h +++ b/arch/s390/include/asm/syscall_wrapper.h @@ -58,7 +58,7 @@ #define __S390_SYS_STUBx(x, name, ...) \ long __s390_sys##name(struct pt_regs *regs); \ - ALLOW_ERROR_INJECTION(__s390_sys##name, ERRNO); \ + ALLOW_ERROR_INJECTION(__s390_sys##name, SYSCALL); \ long __s390_sys##name(struct pt_regs *regs) \ { \ long ret = __do_sys##name(SYSCALL_PT_ARGS(x, regs, \ @@ -74,13 +74,13 @@ #define COMPAT_SYSCALL_DEFINE0(sname) \ SYSCALL_METADATA(_##sname, 0); \ long __s390_compat_sys_##sname(void); \ - ALLOW_ERROR_INJECTION(__s390_compat_sys_##sname, ERRNO); \ + ALLOW_ERROR_INJECTION(__s390_compat_sys_##sname, SYSCALL); \ long __s390_compat_sys_##sname(void) #define SYSCALL_DEFINE0(sname) \ SYSCALL_METADATA(_##sname, 0); \ long __s390x_sys_##sname(void); \ - ALLOW_ERROR_INJECTION(__s390x_sys_##sname, ERRNO); \ + ALLOW_ERROR_INJECTION(__s390x_sys_##sname, SYSCALL); \ long __s390_sys_##sname(void) \ __attribute__((alias(__stringify(__s390x_sys_##sname)))); \ long __s390x_sys_##sname(void) @@ -100,7 +100,7 @@ long __s390_compat_sys##name(struct pt_regs *regs); \ long __s390_compat_sys##name(struct pt_regs *regs) \ __attribute__((alias(__stringify(__se_compat_sys##name)))); \ - ALLOW_ERROR_INJECTION(__s390_compat_sys##name, ERRNO); \ + ALLOW_ERROR_INJECTION(__s390_compat_sys##name, SYSCALL); \ static inline long __do_compat_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)); \ long __se_compat_sys##name(struct pt_regs *regs); \ long __se_compat_sys##name(struct pt_regs *regs) \ @@ -131,7 +131,7 @@ #define SYSCALL_DEFINE0(sname) \ SYSCALL_METADATA(_##sname, 0); \ long __s390x_sys_##sname(void); \ - ALLOW_ERROR_INJECTION(__s390x_sys_##sname, ERRNO); \ + ALLOW_ERROR_INJECTION(__s390x_sys_##sname, SYSCALL); \ long __s390x_sys_##sname(void) #define COND_SYSCALL(name) \ @@ -148,7 +148,7 @@ "Type aliasing is used to sanitize syscall arguments"); \ long __s390x_sys##name(struct pt_regs *regs) \ __attribute__((alias(__stringify(__se_sys##name)))); \ - ALLOW_ERROR_INJECTION(__s390x_sys##name, ERRNO); \ + ALLOW_ERROR_INJECTION(__s390x_sys##name, SYSCALL); \ static inline long __do_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)); \ long __se_sys##name(struct pt_regs *regs); \ __S390_SYS_STUBx(x, name, __VA_ARGS__) \ diff --git a/arch/x86/include/asm/syscall_wrapper.h b/arch/x86/include/asm/syscall_wrapper.h index fd2669b1cb2d..ca0cd8fa1866 100644 --- a/arch/x86/include/asm/syscall_wrapper.h +++ b/arch/x86/include/asm/syscall_wrapper.h @@ -67,13 +67,13 @@ extern long __ia32_sys_ni_syscall(const struct pt_regs *regs); #define __SYS_STUB0(abi, name) \ long __##abi##_##name(const struct pt_regs *regs); \ - ALLOW_ERROR_INJECTION(__##abi##_##name, ERRNO); \ + ALLOW_ERROR_INJECTION(__##abi##_##name, SYSCALL); \ long __##abi##_##name(const struct pt_regs *regs) \ __alias(__do_##name); #define __SYS_STUBx(abi, name, ...) \ long __##abi##_##name(const struct pt_regs *regs); \ - ALLOW_ERROR_INJECTION(__##abi##_##name, ERRNO); \ + ALLOW_ERROR_INJECTION(__##abi##_##name, SYSCALL); \ long __##abi##_##name(const struct pt_regs *regs) \ { \ return __se_##name(__VA_ARGS__); \ diff --git a/include/asm-generic/error-injection.h b/include/asm-generic/error-injection.h index fbca56bd9cbc..c4fb52f5b789 100644 --- a/include/asm-generic/error-injection.h +++ b/include/asm-generic/error-injection.h @@ -9,6 +9,7 @@ enum { EI_ETYPE_ERRNO, /* Return -ERRNO if failure */ EI_ETYPE_ERRNO_NULL, /* Return -ERRNO or NULL if failure */ EI_ETYPE_TRUE, /* Return true if failure */ + EI_ETYPE_SYSCALL, /* Return -ERRNO out of syscall */ }; struct error_injection_entry { diff --git a/include/linux/compat.h b/include/linux/compat.h index 594357881b0b..21d2fd48f7e2 100644 --- a/include/linux/compat.h +++ b/include/linux/compat.h @@ -45,7 +45,7 @@ #ifndef COMPAT_SYSCALL_DEFINE0 #define COMPAT_SYSCALL_DEFINE0(name) \ asmlinkage long compat_sys_##name(void); \ - ALLOW_ERROR_INJECTION(compat_sys_##name, ERRNO); \ + ALLOW_ERROR_INJECTION(compat_sys_##name, SYSCALL); \ asmlinkage long compat_sys_##name(void) #endif /* COMPAT_SYSCALL_DEFINE0 */ @@ -74,7 +74,7 @@ "Type aliasing is used to sanitize syscall arguments");\ asmlinkage long compat_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \ __attribute__((alias(__stringify(__se_compat_sys##name)))); \ - ALLOW_ERROR_INJECTION(compat_sys##name, ERRNO); \ + ALLOW_ERROR_INJECTION(compat_sys##name, SYSCALL); \ static inline long __do_compat_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__));\ asmlinkage long __se_compat_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)); \ asmlinkage long __se_compat_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \ diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index a34b0f9a9972..05fc3a0575c0 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -210,7 +210,7 @@ static inline int is_syscall_trace_event(struct trace_event_call *tp_event) #define SYSCALL_DEFINE0(sname) \ SYSCALL_METADATA(_##sname, 0); \ asmlinkage long sys_##sname(void); \ - ALLOW_ERROR_INJECTION(sys_##sname, ERRNO); \ + ALLOW_ERROR_INJECTION(sys_##sname, SYSCALL); \ asmlinkage long sys_##sname(void) #endif /* SYSCALL_DEFINE0 */ @@ -241,7 +241,7 @@ static inline int is_syscall_trace_event(struct trace_event_call *tp_event) "Type aliasing is used to sanitize syscall arguments");\ asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \ __attribute__((alias(__stringify(__se_sys##name)))); \ - ALLOW_ERROR_INJECTION(sys##name, ERRNO); \ + ALLOW_ERROR_INJECTION(sys##name, SYSCALL); \ static inline long __do_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__));\ asmlinkage long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)); \ asmlinkage long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \ diff --git a/kernel/fail_function.c b/kernel/fail_function.c index a7ccd2930c5f..65d3f5db5f3a 100644 --- a/kernel/fail_function.c +++ b/kernel/fail_function.c @@ -38,6 +38,7 @@ static unsigned long adjust_error_retval(unsigned long addr, unsigned long retv) switch (get_injectable_error_type(addr)) { case EI_ETYPE_NULL: return 0; + case EI_ETYPE_SYSCALL: case EI_ETYPE_ERRNO: if (retv < (unsigned long)-MAX_ERRNO) return (unsigned long)-EINVAL; diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 38545c56bf69..729002405a55 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1874,8 +1874,12 @@ config NETDEV_NOTIFIER_ERROR_INJECT If unsure, say N. config FUNCTION_ERROR_INJECTION + bool + +config FUNC_ERROR_INJECTION bool "Fault-injections of functions" depends on HAVE_FUNCTION_ERROR_INJECTION && KPROBES + select FUNCTION_ERROR_INJECTION help Add fault injections into various functions that are annotated with ALLOW_ERROR_INJECTION() in the kernel. BPF may also modify the return @@ -1883,6 +1887,17 @@ config FUNCTION_ERROR_INJECTION If unsure, say N +config SYSCALL_ERROR_INJECTION + bool "Error injections in syscalls" + depends on HAVE_FUNCTION_ERROR_INJECTION && KPROBES + select FUNCTION_ERROR_INJECTION + default y + help + Allows error injection framework to return errors from syscalls. + BPF may modify return values of syscalls as well. + + If unsure, say Y + config FAULT_INJECTION bool "Fault-injection framework" depends on DEBUG_KERNEL diff --git a/lib/error-inject.c b/lib/error-inject.c index 1afca1b1cdea..9ba868eb8c43 100644 --- a/lib/error-inject.c +++ b/lib/error-inject.c @@ -71,6 +71,10 @@ static void populate_error_injection_list(struct error_injection_entry *start, mutex_lock(&ei_mutex); for (iter = start; iter < end; iter++) { + if (iter->etype != EI_ETYPE_SYSCALL && + !IS_ENABLED(CONFIG_FUNC_ERROR_INJECTION)) + continue; + entry = (unsigned long)dereference_symbol_descriptor((void *)iter->addr); if (!kernel_text_address(entry) || @@ -189,6 +193,8 @@ static const char *error_type_string(int etype) return "ERRNO_NULL"; case EI_ETYPE_TRUE: return "TRUE"; + case EI_ETYPE_SYSCALL: + return "SYSCALL"; default: return "(unknown)"; }
On Fri, 2 Dec 2022 13:27:11 -0800 Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > --- a/lib/Kconfig.debug > +++ b/lib/Kconfig.debug > @@ -1874,8 +1874,12 @@ config NETDEV_NOTIFIER_ERROR_INJECT > If unsure, say N. > > config FUNCTION_ERROR_INJECTION Why not just call this "ERROR_INJECTION" having this be FUNCTION and the one for functions be FUNC is confusing. > + bool > + > +config FUNC_ERROR_INJECTION > bool "Fault-injections of functions" > depends on HAVE_FUNCTION_ERROR_INJECTION && KPROBES > + select FUNCTION_ERROR_INJECTION > help > Add fault injections into various functions that are annotated with > ALLOW_ERROR_INJECTION() in the kernel. BPF may also modify the return > @@ -1883,6 +1887,17 @@ config FUNCTION_ERROR_INJECTION > > If unsure, say N > > +config SYSCALL_ERROR_INJECTION > + bool "Error injections in syscalls" > + depends on HAVE_FUNCTION_ERROR_INJECTION && KPROBES > + select FUNCTION_ERROR_INJECTION > + default y IIUC, Linus prefers everything to be "default n" unless there's a really good reason for it. Like only making other options available, but not doing anything to the kernel. I do have DYNAMIC_FTRACE as "default y" but that's only because it depends on CONFIG_FUNCTION_TRACER and nobody that enables that should have DYNAMIC_FTRACE off (except for academia). > + help > + Allows error injection framework to return errors from syscalls. > + BPF may modify return values of syscalls as well. And here's the thing. If BPF returns anything *but* an error, then this is a misnomer and incorrect. Name it something else like "HIJACK_SYSCALLS". > + > + If unsure, say Y And I'm curious, why Y if unsure? -- Steve > + > config FAULT_INJECTION > bool "Fault-injection framework" > depends on DEBUG_KERNEL > diff --git a/lib/error-inject.c b/lib/error-inject.c > index 1afca1b1cdea..9ba868eb8c43 100644 > --- a/lib/error-inject.c > +++ b/lib/error-inject.c > @@ -71,6 +71,10 @@ static void populate_error_injection_list(struct error_injection_entry *start, > > mutex_lock(&ei_mutex); > for (iter = start; iter < end; iter++) { > + if (iter->etype != EI_ETYPE_SYSCALL && > + !IS_ENABLED(CONFIG_FUNC_ERROR_INJECTION)) > + continue; > + > entry = (unsigned long)dereference_symbol_descriptor((void *)iter->addr); > > if (!kernel_text_address(entry) || > @@ -189,6 +193,8 @@ static const char *error_type_string(int etype) > return "ERRNO_NULL"; > case EI_ETYPE_TRUE: > return "TRUE"; > + case EI_ETYPE_SYSCALL: > + return "SYSCALL"; > default: > return "(unknown)"; > }
On Fri, Dec 02, 2022 at 06:17:24PM -0500, Steven Rostedt wrote: > On Fri, 2 Dec 2022 13:27:11 -0800 > Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > > --- a/lib/Kconfig.debug > > +++ b/lib/Kconfig.debug > > @@ -1874,8 +1874,12 @@ config NETDEV_NOTIFIER_ERROR_INJECT > > If unsure, say N. > > > > config FUNCTION_ERROR_INJECTION > > Why not just call this "ERROR_INJECTION" having this be FUNCTION and the > one for functions be FUNC is confusing. That's what I had initially, but it causes plenty of churn to arch/*/Makefile and a bunch of .h-s. Keeping it as FUNCTION_ERROR_INJECTION removes all that noise from the diff. > > + bool > > + > > +config FUNC_ERROR_INJECTION > > bool "Fault-injections of functions" > > depends on HAVE_FUNCTION_ERROR_INJECTION && KPROBES > > + select FUNCTION_ERROR_INJECTION > > help > > Add fault injections into various functions that are annotated with > > ALLOW_ERROR_INJECTION() in the kernel. BPF may also modify the return > > @@ -1883,6 +1887,17 @@ config FUNCTION_ERROR_INJECTION > > > > If unsure, say N > > > > +config SYSCALL_ERROR_INJECTION > > + bool "Error injections in syscalls" > > + depends on HAVE_FUNCTION_ERROR_INJECTION && KPROBES > > + select FUNCTION_ERROR_INJECTION > > + default y > > IIUC, Linus prefers everything to be "default n" unless there's a really > good reason for it. Like only making other options available, but not doing > anything to the kernel. I do have DYNAMIC_FTRACE as "default y" but that's > only because it depends on CONFIG_FUNCTION_TRACER and nobody that enables > that should have DYNAMIC_FTRACE off (except for academia). The FUNCTION_ERROR_INJECTION used to be "def_bool y" for ~5 years. BTW the macro was called BPF_ALLOW_ERROR_INJECTION() when Josef initially implemented it. Massami later renamed it ALLOW_ERROR_INJECTION() and allowed kprobes to use it. Today there is a user expectation that this feature is available in the kernel. We can do "default n" here, let distros decide and potentially upset users. I don't feel strongly about that. > > > + help > > + Allows error injection framework to return errors from syscalls. > > + BPF may modify return values of syscalls as well. > > And here's the thing. If BPF returns anything *but* an error, then this is > a misnomer and incorrect. Name it something else like "HIJACK_SYSCALLS". The bpf prog must return errno. No doubt about that. Today the verifier validates return values whenever is necessary. When original bpf_override_return was added the verifier wasn't that smart. Since then we added return value checks pretty much everywhere. Looks like the check is still missing bpf_override_return. We will fix it asap. > > + > > + If unsure, say Y > > And I'm curious, why Y if unsure? Copy-paste. I can remove that line.
On Fri, 2 Dec 2022 13:27:11 -0800 Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > On Fri, Dec 02, 2022 at 10:56:52AM -0500, Theodore Ts'o wrote: > > On Thu, Dec 01, 2022 at 05:41:29PM -0800, Alexei Starovoitov wrote: > > > > > > The fault injection framework disables individual syscall with zero performance > > > overhead comparing to LSM and seccomp mechanisms. > > > BPF is not involved here. It's a kprobe in one spot. > > > All other syscalls don't notice it. > > > It's an attractive way to improve security. > > > > > > A BPF prog over syscall can filter by user, cgroup, task and give fine grain > > > control over security surface. > > > tbh I'm not aware of folks doing "syscall disabling" through command line like > > > above (I've only seen it through bpf), but it doesn't mean that somebody will > > > not start complaining that their script broke, because distro disabled fault > > > injection. > > > > > > So should we split FUNCTION_ERROR_INJECTION kconfig into two ? > > > And do default N for things like should_failslab() and > > > default Y for syscalls? > > > > How about calling the latter something like bpf syscall hooks, and not > > using the terminology "error injection" in relation to system calls? > > I think that might be less confusing. > > I think 'syscall error injection' name fits well. > It's a generic feature that both kprobes and bpf should be able to use. > Here is the patch... > > Even with this patch we have 7 failures in BPF selftests. > We will fix them later with the same mechanism as we will pick for hid-bpf. > > This patch will keep 'syscall disabling' scripts working > and bpf syscall adjustment will work too. > So no chance of breaking anyone. > While actual error injection inside the kernel will be disabled. > > Better name suggestions are welcome, of course. > > From 2960958f91d1134b1a8f27787875f6b9300f205e Mon Sep 17 00:00:00 2001 > From: Alexei Starovoitov <ast@kernel.org> > Date: Fri, 2 Dec 2022 13:06:08 -0800 > Subject: [PATCH] error-injection: Split FUNCTION_ERROR_INJECTION into syscalls > and the rest. > > Split FUNCTION_ERROR_INJECTION into: > - SYSCALL_ERROR_INJECTION with default y > - FUNC_ERROR_INJECTION with default n. OK, syscall is a bit different, it is clearly the boundary of the functionality, so this seems safe. IMHO, it is better to extend seccomp framework for testing. > > The former is only used to modify return values of syscalls for security and > user space testing reasons while the latter is for the rest of error injection > in the kernel that should only be used to stress test and debug the kernel. > > Signed-off-by: Alexei Starovoitov <ast@kernel.org> > --- > arch/arm64/include/asm/syscall_wrapper.h | 8 ++++---- > arch/powerpc/include/asm/syscall_wrapper.h | 4 ++-- > arch/s390/include/asm/syscall_wrapper.h | 12 ++++++------ > arch/x86/include/asm/syscall_wrapper.h | 4 ++-- > include/asm-generic/error-injection.h | 1 + > include/linux/compat.h | 4 ++-- > include/linux/syscalls.h | 4 ++-- > kernel/fail_function.c | 1 + > lib/Kconfig.debug | 15 +++++++++++++++ > lib/error-inject.c | 6 ++++++ > 10 files changed, 41 insertions(+), 18 deletions(-) > > diff --git a/arch/arm64/include/asm/syscall_wrapper.h b/arch/arm64/include/asm/syscall_wrapper.h > index d30217c21eff..2c5ca239e88c 100644 > --- a/arch/arm64/include/asm/syscall_wrapper.h > +++ b/arch/arm64/include/asm/syscall_wrapper.h > @@ -19,7 +19,7 @@ > > #define COMPAT_SYSCALL_DEFINEx(x, name, ...) \ > asmlinkage long __arm64_compat_sys##name(const struct pt_regs *regs); \ > - ALLOW_ERROR_INJECTION(__arm64_compat_sys##name, ERRNO); \ > + ALLOW_ERROR_INJECTION(__arm64_compat_sys##name, SYSCALL); \ But in that case, please don't use ALLOW_ERROR_INJECTION. I don't want to mix up the function-level error injection(FEI) and syscall error injection. For this reason, I want to decouple it from the FEI. FEI will be used for more kernel internal functions under development (or debugging), which can break something because it will forcibly change the code behavior and the kernel will be in unexpected state. Thank you, > static long __se_compat_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)); \ > static inline long __do_compat_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)); \ > asmlinkage long __arm64_compat_sys##name(const struct pt_regs *regs) \ > @@ -34,7 +34,7 @@ > > #define COMPAT_SYSCALL_DEFINE0(sname) \ > asmlinkage long __arm64_compat_sys_##sname(const struct pt_regs *__unused); \ > - ALLOW_ERROR_INJECTION(__arm64_compat_sys_##sname, ERRNO); \ > + ALLOW_ERROR_INJECTION(__arm64_compat_sys_##sname, SYSCALL); \ > asmlinkage long __arm64_compat_sys_##sname(const struct pt_regs *__unused) > > #define COND_SYSCALL_COMPAT(name) \ > @@ -50,7 +50,7 @@ > > #define __SYSCALL_DEFINEx(x, name, ...) \ > asmlinkage long __arm64_sys##name(const struct pt_regs *regs); \ > - ALLOW_ERROR_INJECTION(__arm64_sys##name, ERRNO); \ > + ALLOW_ERROR_INJECTION(__arm64_sys##name, SYSCALL); \ > static long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)); \ > static inline long __do_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)); \ > asmlinkage long __arm64_sys##name(const struct pt_regs *regs) \ > @@ -69,7 +69,7 @@ > #define SYSCALL_DEFINE0(sname) \ > SYSCALL_METADATA(_##sname, 0); \ > asmlinkage long __arm64_sys_##sname(const struct pt_regs *__unused); \ > - ALLOW_ERROR_INJECTION(__arm64_sys_##sname, ERRNO); \ > + ALLOW_ERROR_INJECTION(__arm64_sys_##sname, SYSCALL); \ > asmlinkage long __arm64_sys_##sname(const struct pt_regs *__unused) > > #define COND_SYSCALL(name) \ > diff --git a/arch/powerpc/include/asm/syscall_wrapper.h b/arch/powerpc/include/asm/syscall_wrapper.h > index 67486c67e8a2..ce1148809c6b 100644 > --- a/arch/powerpc/include/asm/syscall_wrapper.h > +++ b/arch/powerpc/include/asm/syscall_wrapper.h > @@ -17,7 +17,7 @@ struct pt_regs; > > #define __SYSCALL_DEFINEx(x, name, ...) \ > long sys##name(const struct pt_regs *regs); \ > - ALLOW_ERROR_INJECTION(sys##name, ERRNO); \ > + ALLOW_ERROR_INJECTION(sys##name, SYSCALL); \ > static long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)); \ > static inline long __do_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)); \ > long sys##name(const struct pt_regs *regs) \ > @@ -36,7 +36,7 @@ struct pt_regs; > #define SYSCALL_DEFINE0(sname) \ > SYSCALL_METADATA(_##sname, 0); \ > long sys_##sname(const struct pt_regs *__unused); \ > - ALLOW_ERROR_INJECTION(sys_##sname, ERRNO); \ > + ALLOW_ERROR_INJECTION(sys_##sname, SYSCALL); \ > long sys_##sname(const struct pt_regs *__unused) > > #define COND_SYSCALL(name) \ > diff --git a/arch/s390/include/asm/syscall_wrapper.h b/arch/s390/include/asm/syscall_wrapper.h > index fde7e6b1df48..a253def48cbe 100644 > --- a/arch/s390/include/asm/syscall_wrapper.h > +++ b/arch/s390/include/asm/syscall_wrapper.h > @@ -58,7 +58,7 @@ > > #define __S390_SYS_STUBx(x, name, ...) \ > long __s390_sys##name(struct pt_regs *regs); \ > - ALLOW_ERROR_INJECTION(__s390_sys##name, ERRNO); \ > + ALLOW_ERROR_INJECTION(__s390_sys##name, SYSCALL); \ > long __s390_sys##name(struct pt_regs *regs) \ > { \ > long ret = __do_sys##name(SYSCALL_PT_ARGS(x, regs, \ > @@ -74,13 +74,13 @@ > #define COMPAT_SYSCALL_DEFINE0(sname) \ > SYSCALL_METADATA(_##sname, 0); \ > long __s390_compat_sys_##sname(void); \ > - ALLOW_ERROR_INJECTION(__s390_compat_sys_##sname, ERRNO); \ > + ALLOW_ERROR_INJECTION(__s390_compat_sys_##sname, SYSCALL); \ > long __s390_compat_sys_##sname(void) > > #define SYSCALL_DEFINE0(sname) \ > SYSCALL_METADATA(_##sname, 0); \ > long __s390x_sys_##sname(void); \ > - ALLOW_ERROR_INJECTION(__s390x_sys_##sname, ERRNO); \ > + ALLOW_ERROR_INJECTION(__s390x_sys_##sname, SYSCALL); \ > long __s390_sys_##sname(void) \ > __attribute__((alias(__stringify(__s390x_sys_##sname)))); \ > long __s390x_sys_##sname(void) > @@ -100,7 +100,7 @@ > long __s390_compat_sys##name(struct pt_regs *regs); \ > long __s390_compat_sys##name(struct pt_regs *regs) \ > __attribute__((alias(__stringify(__se_compat_sys##name)))); \ > - ALLOW_ERROR_INJECTION(__s390_compat_sys##name, ERRNO); \ > + ALLOW_ERROR_INJECTION(__s390_compat_sys##name, SYSCALL); \ > static inline long __do_compat_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)); \ > long __se_compat_sys##name(struct pt_regs *regs); \ > long __se_compat_sys##name(struct pt_regs *regs) \ > @@ -131,7 +131,7 @@ > #define SYSCALL_DEFINE0(sname) \ > SYSCALL_METADATA(_##sname, 0); \ > long __s390x_sys_##sname(void); \ > - ALLOW_ERROR_INJECTION(__s390x_sys_##sname, ERRNO); \ > + ALLOW_ERROR_INJECTION(__s390x_sys_##sname, SYSCALL); \ > long __s390x_sys_##sname(void) > > #define COND_SYSCALL(name) \ > @@ -148,7 +148,7 @@ > "Type aliasing is used to sanitize syscall arguments"); \ > long __s390x_sys##name(struct pt_regs *regs) \ > __attribute__((alias(__stringify(__se_sys##name)))); \ > - ALLOW_ERROR_INJECTION(__s390x_sys##name, ERRNO); \ > + ALLOW_ERROR_INJECTION(__s390x_sys##name, SYSCALL); \ > static inline long __do_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)); \ > long __se_sys##name(struct pt_regs *regs); \ > __S390_SYS_STUBx(x, name, __VA_ARGS__) \ > diff --git a/arch/x86/include/asm/syscall_wrapper.h b/arch/x86/include/asm/syscall_wrapper.h > index fd2669b1cb2d..ca0cd8fa1866 100644 > --- a/arch/x86/include/asm/syscall_wrapper.h > +++ b/arch/x86/include/asm/syscall_wrapper.h > @@ -67,13 +67,13 @@ extern long __ia32_sys_ni_syscall(const struct pt_regs *regs); > > #define __SYS_STUB0(abi, name) \ > long __##abi##_##name(const struct pt_regs *regs); \ > - ALLOW_ERROR_INJECTION(__##abi##_##name, ERRNO); \ > + ALLOW_ERROR_INJECTION(__##abi##_##name, SYSCALL); \ > long __##abi##_##name(const struct pt_regs *regs) \ > __alias(__do_##name); > > #define __SYS_STUBx(abi, name, ...) \ > long __##abi##_##name(const struct pt_regs *regs); \ > - ALLOW_ERROR_INJECTION(__##abi##_##name, ERRNO); \ > + ALLOW_ERROR_INJECTION(__##abi##_##name, SYSCALL); \ > long __##abi##_##name(const struct pt_regs *regs) \ > { \ > return __se_##name(__VA_ARGS__); \ > diff --git a/include/asm-generic/error-injection.h b/include/asm-generic/error-injection.h > index fbca56bd9cbc..c4fb52f5b789 100644 > --- a/include/asm-generic/error-injection.h > +++ b/include/asm-generic/error-injection.h > @@ -9,6 +9,7 @@ enum { > EI_ETYPE_ERRNO, /* Return -ERRNO if failure */ > EI_ETYPE_ERRNO_NULL, /* Return -ERRNO or NULL if failure */ > EI_ETYPE_TRUE, /* Return true if failure */ > + EI_ETYPE_SYSCALL, /* Return -ERRNO out of syscall */ > }; > > struct error_injection_entry { > diff --git a/include/linux/compat.h b/include/linux/compat.h > index 594357881b0b..21d2fd48f7e2 100644 > --- a/include/linux/compat.h > +++ b/include/linux/compat.h > @@ -45,7 +45,7 @@ > #ifndef COMPAT_SYSCALL_DEFINE0 > #define COMPAT_SYSCALL_DEFINE0(name) \ > asmlinkage long compat_sys_##name(void); \ > - ALLOW_ERROR_INJECTION(compat_sys_##name, ERRNO); \ > + ALLOW_ERROR_INJECTION(compat_sys_##name, SYSCALL); \ > asmlinkage long compat_sys_##name(void) > #endif /* COMPAT_SYSCALL_DEFINE0 */ > > @@ -74,7 +74,7 @@ > "Type aliasing is used to sanitize syscall arguments");\ > asmlinkage long compat_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \ > __attribute__((alias(__stringify(__se_compat_sys##name)))); \ > - ALLOW_ERROR_INJECTION(compat_sys##name, ERRNO); \ > + ALLOW_ERROR_INJECTION(compat_sys##name, SYSCALL); \ > static inline long __do_compat_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__));\ > asmlinkage long __se_compat_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)); \ > asmlinkage long __se_compat_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \ > diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h > index a34b0f9a9972..05fc3a0575c0 100644 > --- a/include/linux/syscalls.h > +++ b/include/linux/syscalls.h > @@ -210,7 +210,7 @@ static inline int is_syscall_trace_event(struct trace_event_call *tp_event) > #define SYSCALL_DEFINE0(sname) \ > SYSCALL_METADATA(_##sname, 0); \ > asmlinkage long sys_##sname(void); \ > - ALLOW_ERROR_INJECTION(sys_##sname, ERRNO); \ > + ALLOW_ERROR_INJECTION(sys_##sname, SYSCALL); \ > asmlinkage long sys_##sname(void) > #endif /* SYSCALL_DEFINE0 */ > > @@ -241,7 +241,7 @@ static inline int is_syscall_trace_event(struct trace_event_call *tp_event) > "Type aliasing is used to sanitize syscall arguments");\ > asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \ > __attribute__((alias(__stringify(__se_sys##name)))); \ > - ALLOW_ERROR_INJECTION(sys##name, ERRNO); \ > + ALLOW_ERROR_INJECTION(sys##name, SYSCALL); \ > static inline long __do_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__));\ > asmlinkage long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)); \ > asmlinkage long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \ > diff --git a/kernel/fail_function.c b/kernel/fail_function.c > index a7ccd2930c5f..65d3f5db5f3a 100644 > --- a/kernel/fail_function.c > +++ b/kernel/fail_function.c > @@ -38,6 +38,7 @@ static unsigned long adjust_error_retval(unsigned long addr, unsigned long retv) > switch (get_injectable_error_type(addr)) { > case EI_ETYPE_NULL: > return 0; > + case EI_ETYPE_SYSCALL: > case EI_ETYPE_ERRNO: > if (retv < (unsigned long)-MAX_ERRNO) > return (unsigned long)-EINVAL; > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug > index 38545c56bf69..729002405a55 100644 > --- a/lib/Kconfig.debug > +++ b/lib/Kconfig.debug > @@ -1874,8 +1874,12 @@ config NETDEV_NOTIFIER_ERROR_INJECT > If unsure, say N. > > config FUNCTION_ERROR_INJECTION > + bool > + > +config FUNC_ERROR_INJECTION > bool "Fault-injections of functions" > depends on HAVE_FUNCTION_ERROR_INJECTION && KPROBES > + select FUNCTION_ERROR_INJECTION > help > Add fault injections into various functions that are annotated with > ALLOW_ERROR_INJECTION() in the kernel. BPF may also modify the return > @@ -1883,6 +1887,17 @@ config FUNCTION_ERROR_INJECTION > > If unsure, say N > > +config SYSCALL_ERROR_INJECTION > + bool "Error injections in syscalls" > + depends on HAVE_FUNCTION_ERROR_INJECTION && KPROBES > + select FUNCTION_ERROR_INJECTION > + default y > + help > + Allows error injection framework to return errors from syscalls. > + BPF may modify return values of syscalls as well. > + > + If unsure, say Y > + > config FAULT_INJECTION > bool "Fault-injection framework" > depends on DEBUG_KERNEL > diff --git a/lib/error-inject.c b/lib/error-inject.c > index 1afca1b1cdea..9ba868eb8c43 100644 > --- a/lib/error-inject.c > +++ b/lib/error-inject.c > @@ -71,6 +71,10 @@ static void populate_error_injection_list(struct error_injection_entry *start, > > mutex_lock(&ei_mutex); > for (iter = start; iter < end; iter++) { > + if (iter->etype != EI_ETYPE_SYSCALL && > + !IS_ENABLED(CONFIG_FUNC_ERROR_INJECTION)) > + continue; > + > entry = (unsigned long)dereference_symbol_descriptor((void *)iter->addr); > > if (!kernel_text_address(entry) || > @@ -189,6 +193,8 @@ static const char *error_type_string(int etype) > return "ERRNO_NULL"; > case EI_ETYPE_TRUE: > return "TRUE"; > + case EI_ETYPE_SYSCALL: > + return "SYSCALL"; > default: > return "(unknown)"; > } > -- > 2.30.2 >
On Dec 02 2022, Alexei Starovoitov wrote: > On Fri, Dec 02, 2022 at 03:55:38PM +0100, Benjamin Tissoires wrote: > > > > > > On 12/1/22 22:13, Linus Torvalds wrote: > > > On Thu, Dec 1, 2022 at 8:59 AM Alexei Starovoitov > > > <alexei.starovoitov@gmail.com> wrote: > > > > > > > > The hid-bpf framework depends on it. > > > > > > Ok, this is completely unacceptably disgusting hack. > > > > [foreword: I have read the other replies, just replying to this one > > because it is the explicit ask for a fix] > > > > > > > > That needs fixing. > > > > > > > Either hid-bpf or bpf core can add > > > > "depends on FUNCTION_ERROR_INJECTION" > > > > > > No, it needs to be narrowed down a lot. Nobody sane wants error > > > injection just because they want some random HID thing. > > > > > > And no, BPF shouldn't need it either. > > > > > > This needs to be narrowed down to the point where HID can say "I want > > > *this* particular call to be able to be a bpf call. > > > > So, would the following be OK? I still have a few concerns I'll explain > > after the patch. > > [...] > > > > While this patch removes the need for ALLOW_ERROR_INJECTION it has a > > couple of drawbacks: > > - suddenly we lose the nice separation of concerns between bpf core and > > its users (HID in my case) > > - it would need to be changed in 6.3 simply because of the previous > > point, so it is just a temporary fix. And third, it was bogus because the check_attach_modify_return() test was inverted. > > Agree, but it works short term. > A silver lining is BTF_SET approach consumes 4 bytes per mark > while ALLOW_ERROR_INJECTION consumes 16 bytes for struct error_injection_entry > and then another 48 bytes per mark for struct ei_entry. > > An alternative would be to define a known prefix like "bpf_modret_" > or "bpf_hook_" and rename these three functions. Not a big fan of that idea, sorry. It would work, for sure, but I don't want to prefix my subsystem API by "bpf_modret_" which would make it look like it is not part of my subsystem. > > Then there will be no need for #include <linux/hid_bpf.h> in bpf core. So I took your other advice to look into register_btf_kfunc_id_set(). And given that the internals of kfuncs are no more than a binary list of ids, we can easily adapt it to have a better API to declare which functions are fmodret. See [1] for the new series. The net benefit are that now we can declare those functions outside of any ALLOW_ERROR_INJECTION, outside of bpf-core, and also we can tag sleepable ones which was not the case previously. So I am rather happy with that v2. I'm sure there will be bikeshedding, but this one looks better than my previous patch here. > > > So I am not sure if this would qualify HID-BPF for 6.2. Please speak up. > > Since that was the only thing I think it's fine to stay in the queue. My plan is to see if we can get this validated in the next 2 days and if not, drop it from 6.2 and reintroduce it in 6.3. After the weekend I realized that delaying HID_BPF for one more release was not too much of an issue in the end. Cheers, Benjamin [1] https://lore.kernel.org/all/20221205164856.705656-2-benjamin.tissoires@redhat.com/
On Mon, Dec 05, 2022 at 07:50:15AM +0900, Masami Hiramatsu wrote: > On Fri, 2 Dec 2022 13:27:11 -0800 > Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > > On Fri, Dec 02, 2022 at 10:56:52AM -0500, Theodore Ts'o wrote: > > > On Thu, Dec 01, 2022 at 05:41:29PM -0800, Alexei Starovoitov wrote: > > > > > > > > The fault injection framework disables individual syscall with zero performance > > > > overhead comparing to LSM and seccomp mechanisms. > > > > BPF is not involved here. It's a kprobe in one spot. > > > > All other syscalls don't notice it. > > > > It's an attractive way to improve security. > > > > > > > > A BPF prog over syscall can filter by user, cgroup, task and give fine grain > > > > control over security surface. > > > > tbh I'm not aware of folks doing "syscall disabling" through command line like > > > > above (I've only seen it through bpf), but it doesn't mean that somebody will > > > > not start complaining that their script broke, because distro disabled fault > > > > injection. > > > > > > > > So should we split FUNCTION_ERROR_INJECTION kconfig into two ? > > > > And do default N for things like should_failslab() and > > > > default Y for syscalls? > > > > > > How about calling the latter something like bpf syscall hooks, and not > > > using the terminology "error injection" in relation to system calls? > > > I think that might be less confusing. > > > > I think 'syscall error injection' name fits well. > > It's a generic feature that both kprobes and bpf should be able to use. > > Here is the patch... > > > > Even with this patch we have 7 failures in BPF selftests. > > We will fix them later with the same mechanism as we will pick for hid-bpf. > > > > This patch will keep 'syscall disabling' scripts working > > and bpf syscall adjustment will work too. > > So no chance of breaking anyone. > > While actual error injection inside the kernel will be disabled. > > > > Better name suggestions are welcome, of course. > > > > From 2960958f91d1134b1a8f27787875f6b9300f205e Mon Sep 17 00:00:00 2001 > > From: Alexei Starovoitov <ast@kernel.org> > > Date: Fri, 2 Dec 2022 13:06:08 -0800 > > Subject: [PATCH] error-injection: Split FUNCTION_ERROR_INJECTION into syscalls > > and the rest. > > > > Split FUNCTION_ERROR_INJECTION into: > > - SYSCALL_ERROR_INJECTION with default y > > - FUNC_ERROR_INJECTION with default n. > > OK, syscall is a bit different, it is clearly the boundary of the > functionality, so this seems safe. > IMHO, it is better to extend seccomp framework for testing. seccomp doesn't support eBPF > > > > The former is only used to modify return values of syscalls for security and > > user space testing reasons while the latter is for the rest of error injection > > in the kernel that should only be used to stress test and debug the kernel. > > > > Signed-off-by: Alexei Starovoitov <ast@kernel.org> > > --- > > arch/arm64/include/asm/syscall_wrapper.h | 8 ++++---- > > arch/powerpc/include/asm/syscall_wrapper.h | 4 ++-- > > arch/s390/include/asm/syscall_wrapper.h | 12 ++++++------ > > arch/x86/include/asm/syscall_wrapper.h | 4 ++-- > > include/asm-generic/error-injection.h | 1 + > > include/linux/compat.h | 4 ++-- > > include/linux/syscalls.h | 4 ++-- > > kernel/fail_function.c | 1 + > > lib/Kconfig.debug | 15 +++++++++++++++ > > lib/error-inject.c | 6 ++++++ > > 10 files changed, 41 insertions(+), 18 deletions(-) > > > > diff --git a/arch/arm64/include/asm/syscall_wrapper.h b/arch/arm64/include/asm/syscall_wrapper.h > > index d30217c21eff..2c5ca239e88c 100644 > > --- a/arch/arm64/include/asm/syscall_wrapper.h > > +++ b/arch/arm64/include/asm/syscall_wrapper.h > > @@ -19,7 +19,7 @@ > > > > #define COMPAT_SYSCALL_DEFINEx(x, name, ...) \ > > asmlinkage long __arm64_compat_sys##name(const struct pt_regs *regs); \ > > - ALLOW_ERROR_INJECTION(__arm64_compat_sys##name, ERRNO); \ > > + ALLOW_ERROR_INJECTION(__arm64_compat_sys##name, SYSCALL); \ > > But in that case, please don't use ALLOW_ERROR_INJECTION. I don't want to > mix up the function-level error injection(FEI) and syscall error injection. Are you suggesting to copy-paste ALLOW_ERROR_INJECTION logic into another special section, another vmlinux.lds.h hack, copy-paste of lib/error-inject.c ? Only to have a different name? Sorry, but that makes no sense. syscalls return errno towards user space, while the rest of 'error inject' funcs return errno towards the kernel. Both are quite similar. There is no need to duplicate: debugfs_create_dir("error_injection", ... fault_create_debugfs_attr("fail_function", ... > For this reason, I want to decouple it from the FEI. FEI will be used > for more kernel internal functions under development (or debugging), > which can break something because it will forcibly change the code > behavior and the kernel will be in unexpected state. There is no 'unexpected state'. When Josef added BPF_ALLOW_ERROR_INJECTION() in include/linux/bpf.h we marked several functions in fs/btrfs/ this way. Later more functions were marked. The callers of all those functions have to be ready to deal with errors. If any of the currently marked functions can oops the kernel it's a bug in the caller and it has to be fixed, because normal execution can sooner or later return similar error. Consider ALLOW_ERROR_INJECTION(should_failslab, ERRNO); That function was specifically added to exercise memory allocation errors. The bpf error injection mechanism is not the only one that can generate the errors. Later you renamed BPF_ALLOW_ERROR_INJECTION into ALLOW_ERROR_INJECTION in commit 540adea3809f ("error-injection: Separate error-injection from kprobe"), but the main purpose of "bpf error injection" stayed the same. We didn't mark random kernel functions as 'inject errors here'. Only those whose callers must do sane things in case of errors. So attemp to 'will be used for more kernel internal functions under development' doesn't fit the spirit for bpf error injection as it is today. For this kind of random kernel injection please use some other mechanism. We cannot allow bpf to change return values of random function. As far as users of this [BPF_]ALLOW_ERROR_INJECTION... I couldn't find any blog, article or post that is talking about text interface to tweak return values /sys/kernel/debug/fail_function. Only links to kernel doc. But there are plenty of BPF users of error injection. Like: https://github.com/iovisor/bcc/blob/master/tools/inject_example.txt https://chaos-mesh.org/docs/simulate-kernel-chaos-on-kubernetes/ https://github.com/iovisor/bpftrace/blob/master/docs/reference_guide.md#20-override-override-return-value so we should tailor this 'error injection' facility to actual users and not hypothetical 'more kernel internal functions under development'.
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index c3c0b077ade3..9ee72d8860c3 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1874,8 +1874,14 @@ config NETDEV_NOTIFIER_ERROR_INJECT If unsure, say N. config FUNCTION_ERROR_INJECTION - def_bool y + bool "Fault-injections of functions" depends on HAVE_FUNCTION_ERROR_INJECTION && KPROBES + help + Add fault injections into various functions that are annotated with + ALLOW_ERROR_INJECTION() in the kernel. BPF may also modify the return + value of theses functions. This is useful to test error paths of code. + + If unsure, say N config FAULT_INJECTION bool "Fault-injection framework"