error-injection: Add prompt for function error injection

Message ID 20221121104403.1545f9b5@gandalf.local.home
State New
Headers
Series error-injection: Add prompt for function error injection |

Commit Message

Steven Rostedt Nov. 21, 2022, 3:44 p.m. UTC
  From: "Steven Rostedt (Google)" <rostedt@goodmis.org>

The config to be able to inject error codes into any function annotated
with ALLOW_ERROR_INJECTION() is enabled when CONFIG_FUNCTION_ERROR_INJECTION
is enabled. But unfortunately, this is always enabled on x86 when KPROBES
is enabled, and there's no way to turn it off.

As kprobes is useful for observability of the kernel, it is useful to have
it enabled in production environments. But error injection should be
avoided. Add a prompt to the config to allow it to be disabled even when
kprobes is enabled, and get rid of the "def_bool y".

This is a kernel debug feature (it's in Kconfig.debug), and should have
never been something enabled by default.

Cc: stable@vger.kernel.org
Fixes: 540adea3809f6 ("error-injection: Separate error-injection from kprobe")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 lib/Kconfig.debug | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)
  

Comments

Borislav Petkov Nov. 21, 2022, 7:32 p.m. UTC | #1
On Mon, Nov 21, 2022 at 10:44:03AM -0500, Steven Rostedt wrote:
> From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
> 
> The config to be able to inject error codes into any function annotated
> with ALLOW_ERROR_INJECTION() is enabled when CONFIG_FUNCTION_ERROR_INJECTION
> is enabled. But unfortunately, this is always enabled on x86 when KPROBES
> is enabled, and there's no way to turn it off.
> 
> As kprobes is useful for observability of the kernel, it is useful to have
> it enabled in production environments. But error injection should be
> avoided. Add a prompt to the config to allow it to be disabled even when
> kprobes is enabled, and get rid of the "def_bool y".
> 
> This is a kernel debug feature (it's in Kconfig.debug), and should have
> never been something enabled by default.
> 
> Cc: stable@vger.kernel.org
> Fixes: 540adea3809f6 ("error-injection: Separate error-injection from kprobe")
> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> ---
>  lib/Kconfig.debug | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)

As stated on another thread, debugging production kernels where folks
have been injecting errors into functions is not something anyone would
like to and have to do. Especially if from looking at system dumps, it
is not even clear what has been injected and why, rendering the system
unstable and the issue probably unreproducible.

Acked-by: Borislav Petkov <bp@suse.de>

Thx.
  
Masami Hiramatsu (Google) Nov. 21, 2022, 10:24 p.m. UTC | #2
On Mon, 21 Nov 2022 10:44:03 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
> 
> The config to be able to inject error codes into any function annotated
> with ALLOW_ERROR_INJECTION() is enabled when CONFIG_FUNCTION_ERROR_INJECTION
> is enabled. But unfortunately, this is always enabled on x86 when KPROBES
> is enabled, and there's no way to turn it off.
> 
> As kprobes is useful for observability of the kernel, it is useful to have
> it enabled in production environments. But error injection should be
> avoided. Add a prompt to the config to allow it to be disabled even when
> kprobes is enabled, and get rid of the "def_bool y".
> 
> This is a kernel debug feature (it's in Kconfig.debug), and should have
> never been something enabled by default.
> 

Oops, thanks for update. Yes, it should not be enabled in the production system.

Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Thanks,

> Cc: stable@vger.kernel.org
> Fixes: 540adea3809f6 ("error-injection: Separate error-injection from kprobe")
> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> ---
>  lib/Kconfig.debug | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index c3c0b077ade3..9ee72d8860c3 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -1874,8 +1874,14 @@ config NETDEV_NOTIFIER_ERROR_INJECT
>  	  If unsure, say N.
>  
>  config FUNCTION_ERROR_INJECTION
> -	def_bool y
> +	bool "Fault-injections of functions"
>  	depends on HAVE_FUNCTION_ERROR_INJECTION && KPROBES
> +	help
> +	  Add fault injections into various functions that are annotated with
> +	  ALLOW_ERROR_INJECTION() in the kernel. BPF may also modify the return
> +	  value of theses functions. This is useful to test error paths of code.
> +
> +	  If unsure, say N
>  
>  config FAULT_INJECTION
>  	bool "Fault-injection framework"
> -- 
> 2.35.1
>
  
Alexei Starovoitov Nov. 21, 2022, 11:36 p.m. UTC | #3
On Mon, Nov 21, 2022 at 11:32 AM Borislav Petkov <bp@alien8.de> wrote:
>
> On Mon, Nov 21, 2022 at 10:44:03AM -0500, Steven Rostedt wrote:
> > From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
> >
> > The config to be able to inject error codes into any function annotated
> > with ALLOW_ERROR_INJECTION() is enabled when CONFIG_FUNCTION_ERROR_INJECTION
> > is enabled. But unfortunately, this is always enabled on x86 when KPROBES
> > is enabled, and there's no way to turn it off.
> >
> > As kprobes is useful for observability of the kernel, it is useful to have
> > it enabled in production environments. But error injection should be
> > avoided. Add a prompt to the config to allow it to be disabled even when
> > kprobes is enabled, and get rid of the "def_bool y".
> >
> > This is a kernel debug feature (it's in Kconfig.debug), and should have
> > never been something enabled by default.
> >
> > Cc: stable@vger.kernel.org
> > Fixes: 540adea3809f6 ("error-injection: Separate error-injection from kprobe")
> > Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> > ---
> >  lib/Kconfig.debug | 8 +++++++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
>
> As stated on another thread, debugging production kernels where folks
> have been injecting errors into functions is not something anyone would
> like to and have to do. Especially if from looking at system dumps, it
> is not even clear what has been injected and why, rendering the system
> unstable and the issue probably unreproducible.
>
> Acked-by: Borislav Petkov <bp@suse.de>

The commit log is bogus and the lack of understanding what
bpf and error injection hooks are used for expressed
in this thread is quite sad.
Disabling error injection makes the system _less_ secure.
But giving people an option to reduce security is a decision
that every distro and data center has to make on their own.
  
Masami Hiramatsu (Google) Nov. 22, 2022, 12:09 a.m. UTC | #4
On Mon, 21 Nov 2022 15:36:08 -0800
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> On Mon, Nov 21, 2022 at 11:32 AM Borislav Petkov <bp@alien8.de> wrote:
> >
> > On Mon, Nov 21, 2022 at 10:44:03AM -0500, Steven Rostedt wrote:
> > > From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
> > >
> > > The config to be able to inject error codes into any function annotated
> > > with ALLOW_ERROR_INJECTION() is enabled when CONFIG_FUNCTION_ERROR_INJECTION
> > > is enabled. But unfortunately, this is always enabled on x86 when KPROBES
> > > is enabled, and there's no way to turn it off.
> > >
> > > As kprobes is useful for observability of the kernel, it is useful to have
> > > it enabled in production environments. But error injection should be
> > > avoided. Add a prompt to the config to allow it to be disabled even when
> > > kprobes is enabled, and get rid of the "def_bool y".
> > >
> > > This is a kernel debug feature (it's in Kconfig.debug), and should have
> > > never been something enabled by default.
> > >
> > > Cc: stable@vger.kernel.org
> > > Fixes: 540adea3809f6 ("error-injection: Separate error-injection from kprobe")
> > > Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> > > ---
> > >  lib/Kconfig.debug | 8 +++++++-
> > >  1 file changed, 7 insertions(+), 1 deletion(-)
> >
> > As stated on another thread, debugging production kernels where folks
> > have been injecting errors into functions is not something anyone would
> > like to and have to do. Especially if from looking at system dumps, it
> > is not even clear what has been injected and why, rendering the system
> > unstable and the issue probably unreproducible.
> >
> > Acked-by: Borislav Petkov <bp@suse.de>
> 
> The commit log is bogus and the lack of understanding what
> bpf and error injection hooks are used for expressed
> in this thread is quite sad.
> Disabling error injection makes the system _less_ secure.

Why? I thought this was only used for testing. Or, are you
using this for changing the kernel behavior in production
environment?

For example, the commit 540adea3809f6 ("error-injection: Separate
error-injection from kprobe") specifies that some btrfs functions
to whitelist, which is I thought only for the testing btrfs.

Now it seems more functions related to syscalls registered to
the whitelist. (I didn't notice that...) If it is intended to
filter syscalls, I recommend you to use secomp instead of this.

> But giving people an option to reduce security is a decision
> that every distro and data center has to make on their own.

This function-level override should be used carefully just for
testing linux kernel code. For forcibly filtering some functionality,
it should use LSM or have another facility based on jump label.
(yeah, if it is for security, why can you use LSM?)


Thank you,
  
Steven Rostedt Nov. 22, 2022, 12:24 a.m. UTC | #5
On Mon, 21 Nov 2022 15:36:08 -0800
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> The commit log is bogus and the lack of understanding what
> bpf and error injection hooks are used for expressed
> in this thread is quite sad.
> Disabling error injection makes the system _less_ secure.

Please specify.

As Masami replied, you are abusing this feature for some arcane way to do
security. It's "error injection" how does enabling this improve security???

-- Steve


> But giving people an option to reduce security is a decision
> that every distro and data center has to make on their own.
  
Steven Rostedt Nov. 22, 2022, 12:40 a.m. UTC | #6
On Mon, 21 Nov 2022 19:24:23 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Mon, 21 Nov 2022 15:36:08 -0800
> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> 
> > The commit log is bogus and the lack of understanding what
> > bpf and error injection hooks are used for expressed
> > in this thread is quite sad.
> > Disabling error injection makes the system _less_ secure.  
> 
> Please specify.
> 
> As Masami replied, you are abusing this feature for some arcane way to do
> security. It's "error injection" how does enabling this improve security???
>

If you want to add BPF programs to determine who or what can access various
functions, then please work with the security folks and hook into their
infrastructure. Please do not make some home grown operations on top of an
interface that was not created for this purpose. That can only lead to
unexpected consequences.

-- Steve
  
Borislav Petkov Nov. 22, 2022, 10:39 a.m. UTC | #7
On Mon, Nov 21, 2022 at 03:36:08PM -0800, Alexei Starovoitov wrote:
> The commit log is bogus and the lack of understanding what

You mean that:

Documentation/fault-injection/fault-injection.rst

?

I don't want any of that possible in production setups. And until you
give me a sane argument why it is good to have in production setups
generically, this is end of story.
  
Chris Mason Nov. 22, 2022, 5:42 p.m. UTC | #8
On 11/22/22 5:39 AM, Borislav Petkov wrote:
> On Mon, Nov 21, 2022 at 03:36:08PM -0800, Alexei Starovoitov wrote:
>> The commit log is bogus and the lack of understanding what
> 
> You mean that:
> 
> Documentation/fault-injection/fault-injection.rst
> 
> ?
> 
> I don't want any of that possible in production setups. And until you
> give me a sane argument why it is good to have in production setups
> generically, this is end of story.
> 

I think there are a few different sides to this:

- it makes total sense that we all have wildly different ideas about
which tools should be available in prod.  Making this decision more fine
grained seems reasonable.

- fault injection for testing: we have a stage of qualification that
does error injection against the prod kernel.  It helps to have this
against the debug kernel too, but that misses some races etc.  I always
just assumed distros and partners did some fault injection tests against
the prod kernel builds?

- fault injection for debugging:  it doesn't happen often but at some
point we run out of ideas and start making different functions fail in
prod to figure out why we're not prodding.

- overriding return values for security fixes: also not a common thing,
but it's a tool we've used.  There are usually better long term fixes,
but it happens.

Stepping back to the big picture of debugging systems with bpf in use, I
love hearing (and telling) stories of debugging difficult problems.  As
far as I know, BPF telling lies hasn't really been a problem for us, so
even though it's a huge tangent, if you have specific examples of
problems you've seen, I'm really interested in hearing more.

When I talk about production, both overall stability and validating new
kernels, if I compare the BPF subsystem with MM, filesystems, cgroups,
the scheduler, networking, and all things Jens, the systems BPF
developers put in place are working really well for me.

If I expand the discussion to the BPF programs themselves, there have
been rare issues.   Still completely on par with the rest of the kernel
subsystems and within the noise in comparison with hardware failures.

In other words, I really do care about the concerns you're expressing
here, and I'm usually first in line to complain when random people make
my job harder.  I'm just not seeing these issues with BPF, and I see
them actively trying to increase safety over time.

-chris
  
Borislav Petkov Nov. 22, 2022, 6:16 p.m. UTC | #9
On Tue, Nov 22, 2022 at 12:42:33PM -0500, Chris Mason wrote:
> I think there are a few different sides to this:
> 
> - it makes total sense that we all have wildly different ideas about
> which tools should be available in prod.  Making this decision more fine
> grained seems reasonable.
> 
> - fault injection for testing: we have a stage of qualification that
> does error injection against the prod kernel.  It helps to have this
> against the debug kernel too, but that misses some races etc.  I always
> just assumed distros and partners did some fault injection tests against
> the prod kernel builds?

That's what the debug kernel flavor is for. At least on SLES.

That's why we have the MCE injection module in the debug flavor and not
in the production one. For the very same reason.

> - overriding return values for security fixes: also not a common thing,
> but it's a tool we've used.  There are usually better long term fixes,
> but it happens.

Yeah, that's what live patching is for.

> In other words, I really do care about the concerns you're expressing
> here, and I'm usually first in line to complain when random people make
> my job harder.  I'm just not seeing these issues with BPF, and I see
> them actively trying to increase safety over time.

So this might be your opinion and I respect it but your first paragraph
was spot on: to *have* the option to decide whether a company wants to
support that in production or not.

I'm sure it makes sense for you in your production scenarios but it
doesn't for us. At least not at this point.

And I think this should be disabled in our kernels for now. When the
team decides someday that they wanna deal with bug reports of people
doing fault injection, then sure by all means.
  
Steven Rostedt Nov. 22, 2022, 6:29 p.m. UTC | #10
On Tue, 22 Nov 2022 12:42:33 -0500
Chris Mason <clm@meta.com> wrote:

> On 11/22/22 5:39 AM, Borislav Petkov wrote:
> > On Mon, Nov 21, 2022 at 03:36:08PM -0800, Alexei Starovoitov wrote:  
> >> The commit log is bogus and the lack of understanding what  
> > 
> > You mean that:
> > 
> > Documentation/fault-injection/fault-injection.rst
> > 
> > ?
> > 
> > I don't want any of that possible in production setups. And until you
> > give me a sane argument why it is good to have in production setups
> > generically, this is end of story.
> >   
> 
> I think there are a few different sides to this:
> 
> - it makes total sense that we all have wildly different ideas about
> which tools should be available in prod.  Making this decision more fine
> grained seems reasonable.
> 
> - fault injection for testing: we have a stage of qualification that
> does error injection against the prod kernel.  It helps to have this
> against the debug kernel too, but that misses some races etc.  I always
> just assumed distros and partners did some fault injection tests against
> the prod kernel builds?
> 
> - fault injection for debugging:  it doesn't happen often but at some
> point we run out of ideas and start making different functions fail in
> prod to figure out why we're not prodding.

As you have stated, we have different ideas for production. Your POV is
cloud based (as is with other parts of my company). But my POV is
Chromebooks where production means what's on a user's device. There's no
reason to ever have fault injection enabled in such cases. I would assume
that distributions are the same. But having kprobes for visibility can also
be useful for debugging purposes, even in the field.

> 
> - overriding return values for security fixes: also not a common thing,
> but it's a tool we've used.  There are usually better long term fixes,
> but it happens.
> 
> Stepping back to the big picture of debugging systems with bpf in use, I
> love hearing (and telling) stories of debugging difficult problems.  As
> far as I know, BPF telling lies hasn't really been a problem for us, so
> even though it's a huge tangent, if you have specific examples of
> problems you've seen, I'm really interested in hearing more.
> 
> When I talk about production, both overall stability and validating new
> kernels, if I compare the BPF subsystem with MM, filesystems, cgroups,
> the scheduler, networking, and all things Jens, the systems BPF
> developers put in place are working really well for me.
> 
> If I expand the discussion to the BPF programs themselves, there have
> been rare issues.   Still completely on par with the rest of the kernel
> subsystems and within the noise in comparison with hardware failures.
> 
> In other words, I really do care about the concerns you're expressing
> here, and I'm usually first in line to complain when random people make
> my job harder.  I'm just not seeing these issues with BPF, and I see
> them actively trying to increase safety over time.

I'm sure you are not seeing theses issues with BPF, as the main developers
and you have the same focus areas.

I have no problem with the concept of BPF. My concern is mostly the
development side of it. As you can basically attach functionality to
arbitrary points in the kernel via BPF programs, the perception is that
anything that is available is fair game. BPF tends to expand features
beyond their intended usage. Heck, look at the name itself. "extended
Berkeley Packet Filter", were eBPF has nothing to do with packet filtering
anymore. Perhaps it should be renamed to CUST (Compiled Use Space
Trampoline) ;-)

Alexei said it's "sad" about my expression of BPF and error injection. If
it has to do with security, then I would like to see more collaboration
with the security folks and perhaps have BPF integrate with their
infrastructure. But the usual response is "that's not fast enough for me"
and then something is done from scratch without working with that
subsystem to make it fast enough. Yes, it takes more time to collaborate
than just doing it on your own. But that's the nature of an open source
*community*.

-- Steve
  
Chris Mason Nov. 22, 2022, 7:51 p.m. UTC | #11
On 11/22/22 1:29 PM, Steven Rostedt wrote:
> On Tue, 22 Nov 2022 12:42:33 -0500
> Chris Mason <clm@meta.com> wrote:
> 
>> On 11/22/22 5:39 AM, Borislav Petkov wrote:
>>> On Mon, Nov 21, 2022 at 03:36:08PM -0800, Alexei Starovoitov wrote:  
>>>> The commit log is bogus and the lack of understanding what  
>>>
>>> You mean that:
>>>
>>> Documentation/fault-injection/fault-injection.rst
>>>
>>> ?
>>>
>>> I don't want any of that possible in production setups. And until you
>>> give me a sane argument why it is good to have in production setups
>>> generically, this is end of story.
>>>   
>>
>> I think there are a few different sides to this:
>>
>> - it makes total sense that we all have wildly different ideas about
>> which tools should be available in prod.  Making this decision more fine
>> grained seems reasonable.
>>
>> - fault injection for testing: we have a stage of qualification that
>> does error injection against the prod kernel.  It helps to have this
>> against the debug kernel too, but that misses some races etc.  I always
>> just assumed distros and partners did some fault injection tests against
>> the prod kernel builds?
>>
>> - fault injection for debugging:  it doesn't happen often but at some
>> point we run out of ideas and start making different functions fail in
>> prod to figure out why we're not prodding.
> 
> As you have stated, we have different ideas for production. Your POV is
> cloud based (as is with other parts of my company). But my POV is
> Chromebooks where production means what's on a user's device. There's no
> reason to ever have fault injection enabled in such cases. I would assume
> that distributions are the same. But having kprobes for visibility can also
> be useful for debugging purposes, even in the field.
> 

Yeah, I definitely don't have opinions on the right way to build a
chromebook, and replying to Boris, only slightly better at distros.
Josef's original intent was this be easy to turn off.

>>
>> - overriding return values for security fixes: also not a common thing,
>> but it's a tool we've used.  There are usually better long term fixes,
>> but it happens.
>>
>> Stepping back to the big picture of debugging systems with bpf in use, I
>> love hearing (and telling) stories of debugging difficult problems.  As
>> far as I know, BPF telling lies hasn't really been a problem for us, so
>> even though it's a huge tangent, if you have specific examples of
>> problems you've seen, I'm really interested in hearing more.
>>
>> When I talk about production, both overall stability and validating new
>> kernels, if I compare the BPF subsystem with MM, filesystems, cgroups,
>> the scheduler, networking, and all things Jens, the systems BPF
>> developers put in place are working really well for me.
>>
>> If I expand the discussion to the BPF programs themselves, there have
>> been rare issues.   Still completely on par with the rest of the kernel
>> subsystems and within the noise in comparison with hardware failures.
>>
>> In other words, I really do care about the concerns you're expressing
>> here, and I'm usually first in line to complain when random people make
>> my job harder.  I'm just not seeing these issues with BPF, and I see
>> them actively trying to increase safety over time.
> 
> I'm sure you are not seeing theses issues with BPF, as the main developers
> and you have the same focus areas.
> 
> I have no problem with the concept of BPF. My concern is mostly the
> development side of it. As you can basically attach functionality to
> arbitrary points in the kernel via BPF programs, the perception is that
> anything that is available is fair game. BPF tends to expand features
> beyond their intended usage. Heck, look at the name itself. "extended
> Berkeley Packet Filter", were eBPF has nothing to do with packet filtering
> anymore. Perhaps it should be renamed to CUST (Compiled Use Space
> Trampoline) ;-)

Developers in general tend to stretch interfaces a lot.  At some point
the friction of using the interface is worse than the friction of
changing it, and things get redone.  At the end of the day, BPF
developers are still kernel developers and we end up with relatively
sane feedback loops.

> 
> Alexei said it's "sad" about my expression of BPF and error injection. If
> it has to do with security, then I would like to see more collaboration
> with the security folks and perhaps have BPF integrate with their
> infrastructure.

Now is a great time to grab KP and hear all about BPF LSM.

> But the usual response is "that's not fast enough for me"
> and then something is done from scratch without working with that
> subsystem to make it fast enough. Yes, it takes more time to collaborate
> than just doing it on your own. But that's the nature of an open source
> *community*.

One of the awkward and wonderful parts of our community is that none of
us have the same goals or needs.  Going back to the original thread, ARM
has either one or two different live patching subsystems in use in the
industry, and neither is upstream.

One reason you end up having these arguments often with BPF is because
they stick around and work with the community to upstream their work.
The tradeoffs, compromises and decisions aren't always what you want,
but we all show up every day and keep engaging.

-chris
  
Andrew Morton Nov. 30, 2022, 10:37 p.m. UTC | #12
On Tue, 22 Nov 2022 14:51:08 -0500 Chris Mason <clm@meta.com> wrote:

> On 11/22/22 1:29 PM, Steven Rostedt wrote:
> > On Tue, 22 Nov 2022 12:42:33 -0500
> > Chris Mason <clm@meta.com> wrote:
> > 
> >> On 11/22/22 5:39 AM, Borislav Petkov wrote:
> >>> On Mon, Nov 21, 2022 at 03:36:08PM -0800, Alexei Starovoitov wrote:  
> >>>> The commit log is bogus and the lack of understanding what  

Why am I not understanding the controversy here?  With this patch
applied, people who want function error injection enable
CONFIG_FUNCTION_ERROR_INJECTION and people who don't want it don't do
that.

Alexei, can you please suggest a less bogus changelog for this?
  
Masami Hiramatsu (Google) Dec. 1, 2022, 2:41 p.m. UTC | #13
Hi,

On Tue, 22 Nov 2022 12:42:33 -0500
Chris Mason <clm@meta.com> wrote:
> 
> - fault injection for testing: we have a stage of qualification that
> does error injection against the prod kernel.  It helps to have this
> against the debug kernel too, but that misses some races etc.  I always
> just assumed distros and partners did some fault injection tests against
> the prod kernel builds?
> 
> - fault injection for debugging:  it doesn't happen often but at some
> point we run out of ideas and start making different functions fail in
> prod to figure out why we're not prodding.

For those purpose, isn't it enough to add a taint flag for the fault
injection? This will help us to identify that the kernel is possible
to be under debug mode.

> - overriding return values for security fixes: also not a common thing,
> but it's a tool we've used.  There are usually better long term fixes,
> but it happens.

I don't recommend to use the fault injection for this purpose. For fixing
the security issue online, you should use livepatch.

Thank you,
  
Alexei Starovoitov Dec. 1, 2022, 4:58 p.m. UTC | #14
On Wed, Nov 30, 2022 at 2:37 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Tue, 22 Nov 2022 14:51:08 -0500 Chris Mason <clm@meta.com> wrote:
>
> > On 11/22/22 1:29 PM, Steven Rostedt wrote:
> > > On Tue, 22 Nov 2022 12:42:33 -0500
> > > Chris Mason <clm@meta.com> wrote:
> > >
> > >> On 11/22/22 5:39 AM, Borislav Petkov wrote:
> > >>> On Mon, Nov 21, 2022 at 03:36:08PM -0800, Alexei Starovoitov wrote:
> > >>>> The commit log is bogus and the lack of understanding what
>
> Why am I not understanding the controversy here?  With this patch
> applied, people who want function error injection enable
> CONFIG_FUNCTION_ERROR_INJECTION and people who don't want it don't do
> that.
>
> Alexei, can you please suggest a less bogus changelog for this?

People are using ALLOW_ERROR_INJECTION to allowlist kernel functions
where bpf progs can attach.
For example in the linux-next:
git grep ALLOW_ERROR_INJECTION
drivers/hid/bpf/hid_bpf_dispatch.c:ALLOW_ERROR_INJECTION(hid_bpf_device_event,
ERRNO);
drivers/hid/bpf/hid_bpf_dispatch.c:ALLOW_ERROR_INJECTION(hid_bpf_rdesc_fixup,
ERRNO);
drivers/hid/bpf/hid_bpf_jmp_table.c:ALLOW_ERROR_INJECTION(__hid_bpf_tail_call,
ERRNO);

The hid-bpf framework depends on it.
iirc Benjamin mentioned that chromeos is one of the future users
of hid-bpf. They need it to deal with a variety of quirks in hid
devices in production.

Either hid-bpf or bpf core can add
"depends on FUNCTION_ERROR_INJECTION"
to its kconfig.
Like:
diff --git a/kernel/bpf/Kconfig b/kernel/bpf/Kconfig
index 2dfe1079f772..281e5263f3d1 100644
--- a/kernel/bpf/Kconfig
+++ b/kernel/bpf/Kconfig
@@ -32,6 +32,7 @@ config BPF_SYSCALL
        select BINARY_PRINTF
        select NET_SOCK_MSG if NET
        select PAGE_POOL if NET
+       depends on FUNCTION_ERROR_INJECTION
        default n

but the better option for now would be to drop this patch.
For the next next merge window we can come up with alternative way
(instead of ALLOW_ERROR_INJECTION) to mark kernel functions
purely on the bpf side.
I don't think we have time to add this marking infrastructure
for the upcoming merge window.
  
Benjamin Tissoires Dec. 1, 2022, 5:39 p.m. UTC | #15
On Thu, Dec 1, 2022 at 5:59 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Nov 30, 2022 at 2:37 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > On Tue, 22 Nov 2022 14:51:08 -0500 Chris Mason <clm@meta.com> wrote:
> >
> > > On 11/22/22 1:29 PM, Steven Rostedt wrote:
> > > > On Tue, 22 Nov 2022 12:42:33 -0500
> > > > Chris Mason <clm@meta.com> wrote:
> > > >
> > > >> On 11/22/22 5:39 AM, Borislav Petkov wrote:
> > > >>> On Mon, Nov 21, 2022 at 03:36:08PM -0800, Alexei Starovoitov wrote:
> > > >>>> The commit log is bogus and the lack of understanding what
> >
> > Why am I not understanding the controversy here?  With this patch
> > applied, people who want function error injection enable
> > CONFIG_FUNCTION_ERROR_INJECTION and people who don't want it don't do
> > that.
> >
> > Alexei, can you please suggest a less bogus changelog for this?
>
> People are using ALLOW_ERROR_INJECTION to allowlist kernel functions
> where bpf progs can attach.
> For example in the linux-next:
> git grep ALLOW_ERROR_INJECTION
> drivers/hid/bpf/hid_bpf_dispatch.c:ALLOW_ERROR_INJECTION(hid_bpf_device_event,
> ERRNO);
> drivers/hid/bpf/hid_bpf_dispatch.c:ALLOW_ERROR_INJECTION(hid_bpf_rdesc_fixup,
> ERRNO);
> drivers/hid/bpf/hid_bpf_jmp_table.c:ALLOW_ERROR_INJECTION(__hid_bpf_tail_call,
> ERRNO);

To be completely honest, I mark hid_bpf_device_event and
hid_bpf_rdesc_fixup as ALLOW_ERROR_INJECTION only to have a dedicated
function to create a SEC("fmod_ret"), but I am actually only using
__hid_bpf_tail_call() to call the bpf programs.

So technically, I should be able to not use ALLOW_ERROR_INJECTION but
that would mean manually calling the BPF programs from
__hid_bpf_tail_call() instead of relying on the magic of libbpf, but
also would force me to have an other way of telling these 2 other
functions are fmod_ret capable, which would be definitely not clean.

>
> The hid-bpf framework depends on it.
> iirc Benjamin mentioned that chromeos is one of the future users
> of hid-bpf. They need it to deal with a variety of quirks in hid
> devices in production.
>
> Either hid-bpf or bpf core can add
> "depends on FUNCTION_ERROR_INJECTION"
> to its kconfig.
> Like:
> diff --git a/kernel/bpf/Kconfig b/kernel/bpf/Kconfig
> index 2dfe1079f772..281e5263f3d1 100644
> --- a/kernel/bpf/Kconfig
> +++ b/kernel/bpf/Kconfig
> @@ -32,6 +32,7 @@ config BPF_SYSCALL
>         select BINARY_PRINTF
>         select NET_SOCK_MSG if NET
>         select PAGE_POOL if NET
> +       depends on FUNCTION_ERROR_INJECTION
>         default n

FWIW, this is what I'm going to apply in hid.git for the time being
[0]. But I'd rather have a BPF_HAVE_FMOD_RET as suggested in [1].

>
> but the better option for now would be to drop this patch.
> For the next next merge window we can come up with alternative way
> (instead of ALLOW_ERROR_INJECTION) to mark kernel functions
> purely on the bpf side.
> I don't think we have time to add this marking infrastructure
> for the upcoming merge window.
>

Outside of the "should we add ALLOW_ERROR_INJECTION in production
environments", all I care about is that I want to be able to attach
SEC("fmod_ret/...") on a specific set of functions that I control. So
for me, I don't need to full "let's randomly add errors in any
functions" (which I doubt you can do with ALLOW_ERROR_INJECTION
anyway), I just need to be able to say "I want a bpf program to be
able to change the return code of this dedicated function".

So I agree with Alexei here. The situation has been to enable this
parameter for quite some time without any complaints, and this patch
prevents HID-BPF to be enabled on systems where sysadmins would think
this is unsafe. Postponing this patch to the next merge window will
give enough time for the BPF folks to change their implementation.

Cheers,
Benjamin

[0] https://lore.kernel.org/linux-input/7df26319-f4ee-6dd1-a1b8-1caaf595528d@nvidia.com/T/#m9416ad54e2ef63244585c4ef83d07bebedf6e143
[1] https://lore.kernel.org/linux-input/CABRcYmKyRchQhabi1Vd9RcMQFCcb=EtWyEbFDFRTc-L-U8WhgA@mail.gmail.com/
  
Andrew Morton Dec. 1, 2022, 9:12 p.m. UTC | #16
On Thu, 1 Dec 2022 08:58:55 -0800 Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> but the better option for now would be to drop this patch.
> For the next next merge window we can come up with alternative way
> (instead of ALLOW_ERROR_INJECTION) to mark kernel functions
> purely on the bpf side.
> I don't think we have time to add this marking infrastructure
> for the upcoming merge window.

OK, thanks, dropped.  Let's revisit this next cycle.
  
Linus Torvalds Dec. 1, 2022, 9:13 p.m. UTC | #17
On Thu, Dec 1, 2022 at 8:59 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> The hid-bpf framework depends on it.

Ok, this is completely unacceptably disgusting hack.

That needs fixing.

> Either hid-bpf or bpf core can add
> "depends on FUNCTION_ERROR_INJECTION"

No, it needs to be narrowed down a lot. Nobody sane wants error
injection just because they want some random HID thing.

And no, BPF shouldn't need it either.

This needs to be narrowed down to the point where HID can say "I want
*this* particular call to be able to be a bpf call.

Stop this crazy "bpf / hid needs error injection" when that then
implies a _lot_ more than that, plus is documented to be something
entirely different anyway.

I realize that HID has mis-used the "we could just use error injection
here to instead insert random bpf code", but that's a complete hack.

Plus it seems to happily not even have made it into mainline anyway,
and only exists in linux-next. Let's head that disgusting hack off at
the pass.

I'm going to apply Steven's patch, because honestly, we need to fix
this disgusting mess *before* it gets to mainline, rather than say
"oh, we already have broken users in next, so let's bend over
backwards for that".

The code is called "error injection", not "random bpf extension"

               Linus
  
Jiri Kosina Dec. 2, 2022, 12:46 a.m. UTC | #18
On Thu, 1 Dec 2022, Linus Torvalds wrote:

> > The hid-bpf framework depends on it.
> 
> Ok, this is completely unacceptably disgusting hack.
> 
> That needs fixing.
> 
> > Either hid-bpf or bpf core can add
> > "depends on FUNCTION_ERROR_INJECTION"
> 
> No, it needs to be narrowed down a lot. Nobody sane wants error
> injection just because they want some random HID thing.
> 
> And no, BPF shouldn't need it either.
> 
> This needs to be narrowed down to the point where HID can say "I want
> *this* particular call to be able to be a bpf call.
> 
> Stop this crazy "bpf / hid needs error injection" when that then
> implies a _lot_ more than that, plus is documented to be something
> entirely different anyway.
> 
> I realize that HID has mis-used the "we could just use error injection
> here to instead insert random bpf code", but that's a complete hack.
> 
> Plus it seems to happily not even have made it into mainline anyway,
> and only exists in linux-next. Let's head that disgusting hack off at
> the pass.
> 
> I'm going to apply Steven's patch, because honestly, we need to fix
> this disgusting mess *before* it gets to mainline, rather than say
> "oh, we already have broken users in next, so let's bend over
> backwards for that".
> 
> The code is called "error injection", not "random bpf extension"

Seems like quite a few parallel threads are currently going on about this, 
so it's a little bit hard to catch up for me as I am apparently CCed only 
on some of them.

Anyway, I believe [1] that ERROR_INJECTION has been designed as a 
debugging feature in the first place, and should stay so. After figuring 
out now that HID-BPF actually has hard dependence on it, I fully agree [2] 
that the series should be ditched for 6.2 and will work with Benjamin to 
have it removed from current hid.git#for-next.

[1] https://lore.kernel.org/all/nycvar.YEU.7.76.2211211716270.27249@gjva.wvxbf.pm/
[2] https://lore.kernel.org/lkml/nycvar.YFH.7.76.2212020135390.6045@cbobk.fhfr.pm/
  
Linus Torvalds Dec. 2, 2022, 12:57 a.m. UTC | #19
On Thu, Dec 1, 2022 at 4:46 PM Jiri Kosina <jikos@kernel.org> wrote:
>
> Anyway, I believe [1] that ERROR_INJECTION has been designed as a
> debugging feature in the first place, and should stay so. After figuring
> out now that HID-BPF actually has hard dependence on it, I fully agree [2]
> that the series should be ditched for 6.2 and will work with Benjamin to
> have it removed from current hid.git#for-next.

I do think that it is interesting to have a "let's have a bpf
insertion hook here", so I'm not against the _concept_ of HID doing
that.

It's not so different from user-mode drivers, after all, which we also
have. A kind of half-way state where we have a kernel driver, but one
that may need custom site-specific (or machine-specific) tweaks.

So I don't want to come across as being against having bpf used for
tuning some HID issue (and I can imagine it making sense in other
places that have machine-specific tweaks - I'm thinking of all the
thermal probe or pincontrol mess where sometimes you have GPIO's or
motherboard thermal sensors etc that are literally "user connected it
to X").

But the notion that we'd use some error injection framework for it,
and that you'd mix those concepts up - *that* I really think is just
horrendous.

Because even if you end up using some common infrastructure code, we
really should separate things out much better.

              Linus
  
Jiri Kosina Dec. 2, 2022, 1:03 a.m. UTC | #20
On Thu, 1 Dec 2022, Linus Torvalds wrote:

> > Anyway, I believe [1] that ERROR_INJECTION has been designed as a
> > debugging feature in the first place, and should stay so. After figuring
> > out now that HID-BPF actually has hard dependence on it, I fully agree [2]
> > that the series should be ditched for 6.2 and will work with Benjamin to
> > have it removed from current hid.git#for-next.
> 
> I do think that it is interesting to have a "let's have a bpf
> insertion hook here", so I'm not against the _concept_ of HID doing
> that.

Absolutely, me neither, quite the contrary -- I am quite happy to see 
HID-BPF happening, because it'll actually make life easier for everybody: 
for people with quirky hardware (trivial testing of fixes), for kernel 
developers (trivial testing of fixes), and for distributions (trivial 
distribution of fixes).

> It's not so different from user-mode drivers, after all, which we also 
> have. A kind of half-way state where we have a kernel driver, but one 
> that may need custom site-specific (or machine-specific) tweaks.

Indeed. The whole rationale from Benjamin, explaining quite nicely why 
HID-BPF is a good thing, can be found in the very original, initial 
ancient cover letter:

	https://lore.kernel.org/lkml/20220224110828.2168231-1-benjamin.tissoires@redhat.com/

> So I don't want to come across as being against having bpf used for 
> tuning some HID issue (and I can imagine it making sense in other places 
> that have machine-specific tweaks - I'm thinking of all the thermal 
> probe or pincontrol mess where sometimes you have GPIO's or motherboard 
> thermal sensors etc that are literally "user connected it to X").
> 
> But the notion that we'd use some error injection framework for it,
> and that you'd mix those concepts up - *that* I really think is just
> horrendous.

Fully agreed. I unfortunately missed that particular aspect during review, 
and it popped up only after HID-BPF appeared in linux-next.
  
Steven Rostedt Dec. 2, 2022, 1:32 a.m. UTC | #21
On Fri, 2 Dec 2022 02:03:03 +0100 (CET)
Jiri Kosina <jikos@kernel.org> wrote:

> On Thu, 1 Dec 2022, Linus Torvalds wrote:
> 
> > > Anyway, I believe [1] that ERROR_INJECTION has been designed as a
> > > debugging feature in the first place, and should stay so. After figuring
> > > out now that HID-BPF actually has hard dependence on it, I fully agree [2]
> > > that the series should be ditched for 6.2 and will work with Benjamin to
> > > have it removed from current hid.git#for-next.  
> > 
> > I do think that it is interesting to have a "let's have a bpf
> > insertion hook here", so I'm not against the _concept_ of HID doing
> > that.  
> 
> Absolutely, me neither, quite the contrary -- I am quite happy to see 
> HID-BPF happening, because it'll actually make life easier for everybody: 
> for people with quirky hardware (trivial testing of fixes), for kernel 
> developers (trivial testing of fixes), and for distributions (trivial 
> distribution of fixes).

Full disclosure, I'm not against a bpf_hook either. In fact, I think I even
stated something to that effect, like adding a bpf_hook annotation to
functions or whatever, so that people can plainly see that the function can
have bpf attached to it.

I just *hate* the ad hoc way of using infrastructure for other purposes
than what they were designed for.

-- Steve
  
Alexei Starovoitov Dec. 2, 2022, 1:41 a.m. UTC | #22
On Thu, Dec 01, 2022 at 01:13:35PM -0800, Linus Torvalds wrote:
> 
> I'm going to apply Steven's patch, because honestly, we need to fix
> this disgusting mess *before* it gets to mainline, rather than say
> "oh, we already have broken users in next, so let's bend over
> backwards for that".
> 
> The code is called "error injection", not "random bpf extension"

Right, ALLOW_ERROR_INJECTION doesn't fit hid-bpf.
As Benjamin clarified for hid-bpf we don't want non-bpf attach to
those 3 functions. Injecting errors there is not useful.
We'll come with some clean mechanism to express "attach bpf here".

But let's examine other places of "error injection".

The following are clearly falling into kernel debugging category:
mm/filemap.c:ALLOW_ERROR_INJECTION(__filemap_add_folio, ERRNO);
mm/page_alloc.c:ALLOW_ERROR_INJECTION(should_fail_alloc_page, TRUE);
mm/slab_common.c:ALLOW_ERROR_INJECTION(should_failslab, ERRNO);

The distros might disable such hooks while data centers might still
enable them. Consider chaosmonkey and other frameworks that stress
user space. They are used on non-production user space while
running on production kernel.
The cloud providers wouldn't want to spin another kernel flavor
with fault injection enabled just to satisfy this set of users.
So the FUNCTION_ERROR_INJECTION kconfig is absolutely necessary.
Whether it defaults to N or Y, doesn't matter much.

But where would you draw the line for:
include/linux/syscalls.h: ALLOW_ERROR_INJECTION(sys_##sname, ERRNO);

Right now people can add to their .bashrc:

cd /sys/kernel/debug/fail_function/
echo __x64_sys_bpf > inject
echo 0xffffFFFF > times
echo 100 > probability
echo 0 > verbose

and stop their favorite syscall ever happening on their laptops after boot.

The fault injection framework disables individual syscall with zero performance
overhead comparing to LSM and seccomp mechanisms.
BPF is not involved here. It's a kprobe in one spot.
All other syscalls don't notice it.
It's an attractive way to improve security.

A BPF prog over syscall can filter by user, cgroup, task and give fine grain
control over security surface.
tbh I'm not aware of folks doing "syscall disabling" through command line like
above (I've only seen it through bpf), but it doesn't mean that somebody will
not start complaining that their script broke, because distro disabled fault
injection.

So should we split FUNCTION_ERROR_INJECTION kconfig into two ?
And do default N for things like should_failslab() and
default Y for syscalls?

In the other thread TAINT_FAULT_INJECTED was proposed.
I think it's fine to taint when injecting errors into should_failslab(),
but tainting when syscall is disabled feels wrong.

One can argue that ALLOW_ERROR_INJECTION(sys_##sname, ERRNO); was an abuse of
fault injection, but let's keep it aside and focus on moving forward from here.
  
Benjamin Tissoires Dec. 2, 2022, 2:55 p.m. UTC | #23
On 12/1/22 22:13, Linus Torvalds wrote:
> On Thu, Dec 1, 2022 at 8:59 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
>>
>> The hid-bpf framework depends on it.
> 
> Ok, this is completely unacceptably disgusting hack.

[foreword: I have read the other replies, just replying to this one
because it is the explicit ask for a fix]

> 
> That needs fixing.
> 
>> Either hid-bpf or bpf core can add
>> "depends on FUNCTION_ERROR_INJECTION"
> 
> No, it needs to be narrowed down a lot. Nobody sane wants error
> injection just because they want some random HID thing.
> 
> And no, BPF shouldn't need it either.
> 
> This needs to be narrowed down to the point where HID can say "I want
> *this* particular call to be able to be a bpf call.

So, would the following be OK? I still have a few concerns I'll explain
after the patch.

---
 From 1290561304eb3e48e1e6d727def13c16698a26f1 Mon Sep 17 00:00:00 2001
From: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Date: Fri, 2 Dec 2022 12:40:29 +0100
Subject: [PATCH] bpf: do not rely on ALLOW_ERROR_INJECTION for fmod_ret

The current way of expressing that a non-bpf kernel component is willing
to accept that bpf programs can be attached to it and that they can change
the return value is to abuse ALLOW_ERROR_INJECTION.
This is debated in the link below, and the result is that it is not a
reasonable thing to do.

An easy fix is to keep the list of valid functions in the BPF verifier
in the same way we keep the non-sleepable ALLOW_ERROR_INJECTION ones.
However, this kind of defeat the point of being able to add bpf APIs in
non-BPF subsystems, so we probably need to rethink that part.


Link: https://lore.kernel.org/all/20221121104403.1545f9b5@gandalf.local.home/
Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
---
  drivers/hid/bpf/hid_bpf_dispatch.c  |  2 --
  drivers/hid/bpf/hid_bpf_jmp_table.c |  1 -
  kernel/bpf/verifier.c               | 20 +++++++++++++++++++-
  3 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/hid/bpf/hid_bpf_dispatch.c b/drivers/hid/bpf/hid_bpf_dispatch.c
index 3c989e74d249..d1f6a1d4ae60 100644
--- a/drivers/hid/bpf/hid_bpf_dispatch.c
+++ b/drivers/hid/bpf/hid_bpf_dispatch.c
@@ -44,7 +44,6 @@ __weak noinline int hid_bpf_device_event(struct hid_bpf_ctx *ctx)
  {
  	return 0;
  }
-ALLOW_ERROR_INJECTION(hid_bpf_device_event, ERRNO);
  
  u8 *
  dispatch_hid_bpf_device_event(struct hid_device *hdev, enum hid_report_type type, u8 *data,
@@ -105,7 +104,6 @@ __weak noinline int hid_bpf_rdesc_fixup(struct hid_bpf_ctx *ctx)
  {
  	return 0;
  }
-ALLOW_ERROR_INJECTION(hid_bpf_rdesc_fixup, ERRNO);
  
  u8 *call_hid_bpf_rdesc_fixup(struct hid_device *hdev, u8 *rdesc, unsigned int *size)
  {
diff --git a/drivers/hid/bpf/hid_bpf_jmp_table.c b/drivers/hid/bpf/hid_bpf_jmp_table.c
index 579a6c06906e..207972b028d9 100644
--- a/drivers/hid/bpf/hid_bpf_jmp_table.c
+++ b/drivers/hid/bpf/hid_bpf_jmp_table.c
@@ -103,7 +103,6 @@ __weak noinline int __hid_bpf_tail_call(struct hid_bpf_ctx *ctx)
  {
  	return 0;
  }
-ALLOW_ERROR_INJECTION(__hid_bpf_tail_call, ERRNO);
  
  int hid_bpf_prog_run(struct hid_device *hdev, enum hid_bpf_prog_type type,
  		     struct hid_bpf_ctx_kern *ctx_kern)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 225666307bba..4eac440d659f 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -24,6 +24,7 @@
  #include <linux/bpf_lsm.h>
  #include <linux/btf_ids.h>
  #include <linux/poison.h>
+#include <linux/hid_bpf.h>
  
  #include "disasm.h"
  
@@ -14827,6 +14828,20 @@ static int check_non_sleepable_error_inject(u32 btf_id)
  	return btf_id_set_contains(&btf_non_sleepable_error_inject, btf_id);
  }
  
+/* Manually tag fmod_ret functions to not misuse ALLOW_ERROR_INJECTION */
+BTF_SET_START(btf_modify_return)
+#if CONFIG_HID_BPF
+BTF_ID(func, hid_bpf_device_event)
+BTF_ID(func, hid_bpf_rdesc_fixup)
+BTF_ID(func, __hid_bpf_tail_call)
+#endif /* CONFIG_HID_BPF */
+BTF_SET_END(btf_modify_return)
+
+static int check_function_modify_return(u32 btf_id)
+{
+	return btf_id_set_contains(&btf_modify_return, btf_id);
+}
+
  int bpf_check_attach_target(struct bpf_verifier_log *log,
  			    const struct bpf_prog *prog,
  			    const struct bpf_prog *tgt_prog,
@@ -15047,7 +15062,10 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
  				bpf_log(log, "can't modify return codes of BPF programs\n");
  				return -EINVAL;
  			}
-			ret = check_attach_modify_return(addr, tname);
+			ret = -EINVAL;
+			if (!check_function_modify_return(btf_id) ||
+			    check_attach_modify_return(addr, tname))
+				ret = 0;
  			if (ret) {
  				bpf_log(log, "%s() is not modifiable\n", tname);
  				return ret;
  
Theodore Ts'o Dec. 2, 2022, 3:56 p.m. UTC | #24
On Thu, Dec 01, 2022 at 05:41:29PM -0800, Alexei Starovoitov wrote:
> 
> The fault injection framework disables individual syscall with zero performance
> overhead comparing to LSM and seccomp mechanisms.
> BPF is not involved here. It's a kprobe in one spot.
> All other syscalls don't notice it.
> It's an attractive way to improve security.
> 
> A BPF prog over syscall can filter by user, cgroup, task and give fine grain
> control over security surface.
> tbh I'm not aware of folks doing "syscall disabling" through command line like
> above (I've only seen it through bpf), but it doesn't mean that somebody will
> not start complaining that their script broke, because distro disabled fault
> injection.
> 
> So should we split FUNCTION_ERROR_INJECTION kconfig into two ?
> And do default N for things like should_failslab() and
> default Y for syscalls?

How about calling the latter something like bpf syscall hooks, and not
using the terminology "error injection" in relation to system calls?
I think that might be less confusing.

							- Ted
  
Alexei Starovoitov Dec. 2, 2022, 7:30 p.m. UTC | #25
On Fri, Dec 02, 2022 at 03:55:38PM +0100, Benjamin Tissoires wrote:
> 
> 
> On 12/1/22 22:13, Linus Torvalds wrote:
> > On Thu, Dec 1, 2022 at 8:59 AM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > > 
> > > The hid-bpf framework depends on it.
> > 
> > Ok, this is completely unacceptably disgusting hack.
> 
> [foreword: I have read the other replies, just replying to this one
> because it is the explicit ask for a fix]
> 
> > 
> > That needs fixing.
> > 
> > > Either hid-bpf or bpf core can add
> > > "depends on FUNCTION_ERROR_INJECTION"
> > 
> > No, it needs to be narrowed down a lot. Nobody sane wants error
> > injection just because they want some random HID thing.
> > 
> > And no, BPF shouldn't need it either.
> > 
> > This needs to be narrowed down to the point where HID can say "I want
> > *this* particular call to be able to be a bpf call.
> 
> So, would the following be OK? I still have a few concerns I'll explain
> after the patch.
> 
> ---
> From 1290561304eb3e48e1e6d727def13c16698a26f1 Mon Sep 17 00:00:00 2001
> From: Benjamin Tissoires <benjamin.tissoires@redhat.com>
> Date: Fri, 2 Dec 2022 12:40:29 +0100
> Subject: [PATCH] bpf: do not rely on ALLOW_ERROR_INJECTION for fmod_ret
> 
> The current way of expressing that a non-bpf kernel component is willing
> to accept that bpf programs can be attached to it and that they can change
> the return value is to abuse ALLOW_ERROR_INJECTION.
> This is debated in the link below, and the result is that it is not a
> reasonable thing to do.
> 
> An easy fix is to keep the list of valid functions in the BPF verifier
> in the same way we keep the non-sleepable ALLOW_ERROR_INJECTION ones.
> However, this kind of defeat the point of being able to add bpf APIs in
> non-BPF subsystems, so we probably need to rethink that part.
> 
> 
> Link: https://lore.kernel.org/all/20221121104403.1545f9b5@gandalf.local.home/
> Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
> ---
>  drivers/hid/bpf/hid_bpf_dispatch.c  |  2 --
>  drivers/hid/bpf/hid_bpf_jmp_table.c |  1 -
>  kernel/bpf/verifier.c               | 20 +++++++++++++++++++-
>  3 files changed, 19 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/hid/bpf/hid_bpf_dispatch.c b/drivers/hid/bpf/hid_bpf_dispatch.c
> index 3c989e74d249..d1f6a1d4ae60 100644
> --- a/drivers/hid/bpf/hid_bpf_dispatch.c
> +++ b/drivers/hid/bpf/hid_bpf_dispatch.c
> @@ -44,7 +44,6 @@ __weak noinline int hid_bpf_device_event(struct hid_bpf_ctx *ctx)
>  {
>  	return 0;
>  }
> -ALLOW_ERROR_INJECTION(hid_bpf_device_event, ERRNO);
>  u8 *
>  dispatch_hid_bpf_device_event(struct hid_device *hdev, enum hid_report_type type, u8 *data,
> @@ -105,7 +104,6 @@ __weak noinline int hid_bpf_rdesc_fixup(struct hid_bpf_ctx *ctx)
>  {
>  	return 0;
>  }
> -ALLOW_ERROR_INJECTION(hid_bpf_rdesc_fixup, ERRNO);
>  u8 *call_hid_bpf_rdesc_fixup(struct hid_device *hdev, u8 *rdesc, unsigned int *size)
>  {
> diff --git a/drivers/hid/bpf/hid_bpf_jmp_table.c b/drivers/hid/bpf/hid_bpf_jmp_table.c
> index 579a6c06906e..207972b028d9 100644
> --- a/drivers/hid/bpf/hid_bpf_jmp_table.c
> +++ b/drivers/hid/bpf/hid_bpf_jmp_table.c
> @@ -103,7 +103,6 @@ __weak noinline int __hid_bpf_tail_call(struct hid_bpf_ctx *ctx)
>  {
>  	return 0;
>  }
> -ALLOW_ERROR_INJECTION(__hid_bpf_tail_call, ERRNO);
>  int hid_bpf_prog_run(struct hid_device *hdev, enum hid_bpf_prog_type type,
>  		     struct hid_bpf_ctx_kern *ctx_kern)
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 225666307bba..4eac440d659f 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -24,6 +24,7 @@
>  #include <linux/bpf_lsm.h>
>  #include <linux/btf_ids.h>
>  #include <linux/poison.h>
> +#include <linux/hid_bpf.h>
>  #include "disasm.h"
> @@ -14827,6 +14828,20 @@ static int check_non_sleepable_error_inject(u32 btf_id)
>  	return btf_id_set_contains(&btf_non_sleepable_error_inject, btf_id);
>  }
> +/* Manually tag fmod_ret functions to not misuse ALLOW_ERROR_INJECTION */
> +BTF_SET_START(btf_modify_return)
> +#if CONFIG_HID_BPF
> +BTF_ID(func, hid_bpf_device_event)
> +BTF_ID(func, hid_bpf_rdesc_fixup)
> +BTF_ID(func, __hid_bpf_tail_call)
> +#endif /* CONFIG_HID_BPF */
> +BTF_SET_END(btf_modify_return)
> +
> +static int check_function_modify_return(u32 btf_id)
> +{
> +	return btf_id_set_contains(&btf_modify_return, btf_id);
> +}
> +
>  int bpf_check_attach_target(struct bpf_verifier_log *log,
>  			    const struct bpf_prog *prog,
>  			    const struct bpf_prog *tgt_prog,
> @@ -15047,7 +15062,10 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
>  				bpf_log(log, "can't modify return codes of BPF programs\n");
>  				return -EINVAL;
>  			}
> -			ret = check_attach_modify_return(addr, tname);
> +			ret = -EINVAL;
> +			if (!check_function_modify_return(btf_id) ||
> +			    check_attach_modify_return(addr, tname))
> +				ret = 0;
>  			if (ret) {
>  				bpf_log(log, "%s() is not modifiable\n", tname);
>  				return ret;
> -- 
> 2.38.1
> ---
> 
> While this patch removes the need for ALLOW_ERROR_INJECTION it has a
> couple of drawbacks:
> - suddenly we lose the nice separation of concerns between bpf core and
> its users (HID in my case)
> - it would need to be changed in 6.3 simply because of the previous
> point, so it is just a temporary fix.

Agree, but it works short term.
A silver lining is BTF_SET approach consumes 4 bytes per mark
while ALLOW_ERROR_INJECTION consumes 16 bytes for struct error_injection_entry
and then another 48 bytes per mark for struct ei_entry.

An alternative would be to define a known prefix like "bpf_modret_"
or "bpf_hook_" and rename these three functions.

Then there will be no need for #include <linux/hid_bpf.h> in bpf core.

> So I am not sure if this would qualify HID-BPF for 6.2. Please speak up.

Since that was the only thing I think it's fine to stay in the queue.
  
Alexei Starovoitov Dec. 2, 2022, 9:27 p.m. UTC | #26
On Fri, Dec 02, 2022 at 10:56:52AM -0500, Theodore Ts'o wrote:
> On Thu, Dec 01, 2022 at 05:41:29PM -0800, Alexei Starovoitov wrote:
> > 
> > The fault injection framework disables individual syscall with zero performance
> > overhead comparing to LSM and seccomp mechanisms.
> > BPF is not involved here. It's a kprobe in one spot.
> > All other syscalls don't notice it.
> > It's an attractive way to improve security.
> > 
> > A BPF prog over syscall can filter by user, cgroup, task and give fine grain
> > control over security surface.
> > tbh I'm not aware of folks doing "syscall disabling" through command line like
> > above (I've only seen it through bpf), but it doesn't mean that somebody will
> > not start complaining that their script broke, because distro disabled fault
> > injection.
> > 
> > So should we split FUNCTION_ERROR_INJECTION kconfig into two ?
> > And do default N for things like should_failslab() and
> > default Y for syscalls?
> 
> How about calling the latter something like bpf syscall hooks, and not
> using the terminology "error injection" in relation to system calls?
> I think that might be less confusing.

I think 'syscall error injection' name fits well.
It's a generic feature that both kprobes and bpf should be able to use.
Here is the patch...

Even with this patch we have 7 failures in BPF selftests.
We will fix them later with the same mechanism as we will pick for hid-bpf.

This patch will keep 'syscall disabling' scripts working
and bpf syscall adjustment will work too.
So no chance of breaking anyone.
While actual error injection inside the kernel will be disabled.

Better name suggestions are welcome, of course.

From 2960958f91d1134b1a8f27787875f6b9300f205e Mon Sep 17 00:00:00 2001
From: Alexei Starovoitov <ast@kernel.org>
Date: Fri, 2 Dec 2022 13:06:08 -0800
Subject: [PATCH] error-injection: Split FUNCTION_ERROR_INJECTION into syscalls
 and the rest.

Split FUNCTION_ERROR_INJECTION into:
- SYSCALL_ERROR_INJECTION with default y
- FUNC_ERROR_INJECTION with default n.

The former is only used to modify return values of syscalls for security and
user space testing reasons while the latter is for the rest of error injection
in the kernel that should only be used to stress test and debug the kernel.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 arch/arm64/include/asm/syscall_wrapper.h   |  8 ++++----
 arch/powerpc/include/asm/syscall_wrapper.h |  4 ++--
 arch/s390/include/asm/syscall_wrapper.h    | 12 ++++++------
 arch/x86/include/asm/syscall_wrapper.h     |  4 ++--
 include/asm-generic/error-injection.h      |  1 +
 include/linux/compat.h                     |  4 ++--
 include/linux/syscalls.h                   |  4 ++--
 kernel/fail_function.c                     |  1 +
 lib/Kconfig.debug                          | 15 +++++++++++++++
 lib/error-inject.c                         |  6 ++++++
 10 files changed, 41 insertions(+), 18 deletions(-)

diff --git a/arch/arm64/include/asm/syscall_wrapper.h b/arch/arm64/include/asm/syscall_wrapper.h
index d30217c21eff..2c5ca239e88c 100644
--- a/arch/arm64/include/asm/syscall_wrapper.h
+++ b/arch/arm64/include/asm/syscall_wrapper.h
@@ -19,7 +19,7 @@
 
 #define COMPAT_SYSCALL_DEFINEx(x, name, ...)						\
 	asmlinkage long __arm64_compat_sys##name(const struct pt_regs *regs);		\
-	ALLOW_ERROR_INJECTION(__arm64_compat_sys##name, ERRNO);				\
+	ALLOW_ERROR_INJECTION(__arm64_compat_sys##name, SYSCALL);				\
 	static long __se_compat_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__));		\
 	static inline long __do_compat_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__));	\
 	asmlinkage long __arm64_compat_sys##name(const struct pt_regs *regs)		\
@@ -34,7 +34,7 @@
 
 #define COMPAT_SYSCALL_DEFINE0(sname)							\
 	asmlinkage long __arm64_compat_sys_##sname(const struct pt_regs *__unused);	\
-	ALLOW_ERROR_INJECTION(__arm64_compat_sys_##sname, ERRNO);			\
+	ALLOW_ERROR_INJECTION(__arm64_compat_sys_##sname, SYSCALL);			\
 	asmlinkage long __arm64_compat_sys_##sname(const struct pt_regs *__unused)
 
 #define COND_SYSCALL_COMPAT(name) 							\
@@ -50,7 +50,7 @@
 
 #define __SYSCALL_DEFINEx(x, name, ...)						\
 	asmlinkage long __arm64_sys##name(const struct pt_regs *regs);		\
-	ALLOW_ERROR_INJECTION(__arm64_sys##name, ERRNO);			\
+	ALLOW_ERROR_INJECTION(__arm64_sys##name, SYSCALL);			\
 	static long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__));		\
 	static inline long __do_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__));	\
 	asmlinkage long __arm64_sys##name(const struct pt_regs *regs)		\
@@ -69,7 +69,7 @@
 #define SYSCALL_DEFINE0(sname)							\
 	SYSCALL_METADATA(_##sname, 0);						\
 	asmlinkage long __arm64_sys_##sname(const struct pt_regs *__unused);	\
-	ALLOW_ERROR_INJECTION(__arm64_sys_##sname, ERRNO);			\
+	ALLOW_ERROR_INJECTION(__arm64_sys_##sname, SYSCALL);			\
 	asmlinkage long __arm64_sys_##sname(const struct pt_regs *__unused)
 
 #define COND_SYSCALL(name)							\
diff --git a/arch/powerpc/include/asm/syscall_wrapper.h b/arch/powerpc/include/asm/syscall_wrapper.h
index 67486c67e8a2..ce1148809c6b 100644
--- a/arch/powerpc/include/asm/syscall_wrapper.h
+++ b/arch/powerpc/include/asm/syscall_wrapper.h
@@ -17,7 +17,7 @@ struct pt_regs;
 
 #define __SYSCALL_DEFINEx(x, name, ...)						\
 	long sys##name(const struct pt_regs *regs);			\
-	ALLOW_ERROR_INJECTION(sys##name, ERRNO);			\
+	ALLOW_ERROR_INJECTION(sys##name, SYSCALL);			\
 	static long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__));		\
 	static inline long __do_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__));	\
 	long sys##name(const struct pt_regs *regs)			\
@@ -36,7 +36,7 @@ struct pt_regs;
 #define SYSCALL_DEFINE0(sname)							\
 	SYSCALL_METADATA(_##sname, 0);						\
 	long sys_##sname(const struct pt_regs *__unused);		\
-	ALLOW_ERROR_INJECTION(sys_##sname, ERRNO);			\
+	ALLOW_ERROR_INJECTION(sys_##sname, SYSCALL);			\
 	long sys_##sname(const struct pt_regs *__unused)
 
 #define COND_SYSCALL(name)							\
diff --git a/arch/s390/include/asm/syscall_wrapper.h b/arch/s390/include/asm/syscall_wrapper.h
index fde7e6b1df48..a253def48cbe 100644
--- a/arch/s390/include/asm/syscall_wrapper.h
+++ b/arch/s390/include/asm/syscall_wrapper.h
@@ -58,7 +58,7 @@
 
 #define __S390_SYS_STUBx(x, name, ...)						\
 	long __s390_sys##name(struct pt_regs *regs);				\
-	ALLOW_ERROR_INJECTION(__s390_sys##name, ERRNO);				\
+	ALLOW_ERROR_INJECTION(__s390_sys##name, SYSCALL);				\
 	long __s390_sys##name(struct pt_regs *regs)				\
 	{									\
 		long ret = __do_sys##name(SYSCALL_PT_ARGS(x, regs,		\
@@ -74,13 +74,13 @@
 #define COMPAT_SYSCALL_DEFINE0(sname)					\
 	SYSCALL_METADATA(_##sname, 0);					\
 	long __s390_compat_sys_##sname(void);				\
-	ALLOW_ERROR_INJECTION(__s390_compat_sys_##sname, ERRNO);	\
+	ALLOW_ERROR_INJECTION(__s390_compat_sys_##sname, SYSCALL);	\
 	long __s390_compat_sys_##sname(void)
 
 #define SYSCALL_DEFINE0(sname)						\
 	SYSCALL_METADATA(_##sname, 0);					\
 	long __s390x_sys_##sname(void);					\
-	ALLOW_ERROR_INJECTION(__s390x_sys_##sname, ERRNO);		\
+	ALLOW_ERROR_INJECTION(__s390x_sys_##sname, SYSCALL);		\
 	long __s390_sys_##sname(void)					\
 		__attribute__((alias(__stringify(__s390x_sys_##sname)))); \
 	long __s390x_sys_##sname(void)
@@ -100,7 +100,7 @@
 	long __s390_compat_sys##name(struct pt_regs *regs);				\
 	long __s390_compat_sys##name(struct pt_regs *regs)				\
 		__attribute__((alias(__stringify(__se_compat_sys##name))));		\
-	ALLOW_ERROR_INJECTION(__s390_compat_sys##name, ERRNO);				\
+	ALLOW_ERROR_INJECTION(__s390_compat_sys##name, SYSCALL);				\
 	static inline long __do_compat_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__));	\
 	long __se_compat_sys##name(struct pt_regs *regs);				\
 	long __se_compat_sys##name(struct pt_regs *regs)				\
@@ -131,7 +131,7 @@
 #define SYSCALL_DEFINE0(sname)						\
 	SYSCALL_METADATA(_##sname, 0);					\
 	long __s390x_sys_##sname(void);					\
-	ALLOW_ERROR_INJECTION(__s390x_sys_##sname, ERRNO);		\
+	ALLOW_ERROR_INJECTION(__s390x_sys_##sname, SYSCALL);		\
 	long __s390x_sys_##sname(void)
 
 #define COND_SYSCALL(name)						\
@@ -148,7 +148,7 @@
 		      "Type aliasing is used to sanitize syscall arguments");		\
 	long __s390x_sys##name(struct pt_regs *regs)					\
 		__attribute__((alias(__stringify(__se_sys##name))));			\
-	ALLOW_ERROR_INJECTION(__s390x_sys##name, ERRNO);				\
+	ALLOW_ERROR_INJECTION(__s390x_sys##name, SYSCALL);				\
 	static inline long __do_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__));		\
 	long __se_sys##name(struct pt_regs *regs);					\
 	__S390_SYS_STUBx(x, name, __VA_ARGS__)						\
diff --git a/arch/x86/include/asm/syscall_wrapper.h b/arch/x86/include/asm/syscall_wrapper.h
index fd2669b1cb2d..ca0cd8fa1866 100644
--- a/arch/x86/include/asm/syscall_wrapper.h
+++ b/arch/x86/include/asm/syscall_wrapper.h
@@ -67,13 +67,13 @@ extern long __ia32_sys_ni_syscall(const struct pt_regs *regs);
 
 #define __SYS_STUB0(abi, name)						\
 	long __##abi##_##name(const struct pt_regs *regs);		\
-	ALLOW_ERROR_INJECTION(__##abi##_##name, ERRNO);			\
+	ALLOW_ERROR_INJECTION(__##abi##_##name, SYSCALL);			\
 	long __##abi##_##name(const struct pt_regs *regs)		\
 		__alias(__do_##name);
 
 #define __SYS_STUBx(abi, name, ...)					\
 	long __##abi##_##name(const struct pt_regs *regs);		\
-	ALLOW_ERROR_INJECTION(__##abi##_##name, ERRNO);			\
+	ALLOW_ERROR_INJECTION(__##abi##_##name, SYSCALL);			\
 	long __##abi##_##name(const struct pt_regs *regs)		\
 	{								\
 		return __se_##name(__VA_ARGS__);			\
diff --git a/include/asm-generic/error-injection.h b/include/asm-generic/error-injection.h
index fbca56bd9cbc..c4fb52f5b789 100644
--- a/include/asm-generic/error-injection.h
+++ b/include/asm-generic/error-injection.h
@@ -9,6 +9,7 @@ enum {
 	EI_ETYPE_ERRNO,		/* Return -ERRNO if failure */
 	EI_ETYPE_ERRNO_NULL,	/* Return -ERRNO or NULL if failure */
 	EI_ETYPE_TRUE,		/* Return true if failure */
+	EI_ETYPE_SYSCALL,	/* Return -ERRNO out of syscall */
 };
 
 struct error_injection_entry {
diff --git a/include/linux/compat.h b/include/linux/compat.h
index 594357881b0b..21d2fd48f7e2 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -45,7 +45,7 @@
 #ifndef COMPAT_SYSCALL_DEFINE0
 #define COMPAT_SYSCALL_DEFINE0(name) \
 	asmlinkage long compat_sys_##name(void); \
-	ALLOW_ERROR_INJECTION(compat_sys_##name, ERRNO); \
+	ALLOW_ERROR_INJECTION(compat_sys_##name, SYSCALL); \
 	asmlinkage long compat_sys_##name(void)
 #endif /* COMPAT_SYSCALL_DEFINE0 */
 
@@ -74,7 +74,7 @@
 		      "Type aliasing is used to sanitize syscall arguments");\
 	asmlinkage long compat_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__))	\
 		__attribute__((alias(__stringify(__se_compat_sys##name))));	\
-	ALLOW_ERROR_INJECTION(compat_sys##name, ERRNO);				\
+	ALLOW_ERROR_INJECTION(compat_sys##name, SYSCALL);				\
 	static inline long __do_compat_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__));\
 	asmlinkage long __se_compat_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__));	\
 	asmlinkage long __se_compat_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__))	\
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index a34b0f9a9972..05fc3a0575c0 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -210,7 +210,7 @@ static inline int is_syscall_trace_event(struct trace_event_call *tp_event)
 #define SYSCALL_DEFINE0(sname)					\
 	SYSCALL_METADATA(_##sname, 0);				\
 	asmlinkage long sys_##sname(void);			\
-	ALLOW_ERROR_INJECTION(sys_##sname, ERRNO);		\
+	ALLOW_ERROR_INJECTION(sys_##sname, SYSCALL);		\
 	asmlinkage long sys_##sname(void)
 #endif /* SYSCALL_DEFINE0 */
 
@@ -241,7 +241,7 @@ static inline int is_syscall_trace_event(struct trace_event_call *tp_event)
 		      "Type aliasing is used to sanitize syscall arguments");\
 	asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__))	\
 		__attribute__((alias(__stringify(__se_sys##name))));	\
-	ALLOW_ERROR_INJECTION(sys##name, ERRNO);			\
+	ALLOW_ERROR_INJECTION(sys##name, SYSCALL);			\
 	static inline long __do_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__));\
 	asmlinkage long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__));	\
 	asmlinkage long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__))	\
diff --git a/kernel/fail_function.c b/kernel/fail_function.c
index a7ccd2930c5f..65d3f5db5f3a 100644
--- a/kernel/fail_function.c
+++ b/kernel/fail_function.c
@@ -38,6 +38,7 @@ static unsigned long adjust_error_retval(unsigned long addr, unsigned long retv)
 	switch (get_injectable_error_type(addr)) {
 	case EI_ETYPE_NULL:
 		return 0;
+	case EI_ETYPE_SYSCALL:
 	case EI_ETYPE_ERRNO:
 		if (retv < (unsigned long)-MAX_ERRNO)
 			return (unsigned long)-EINVAL;
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 38545c56bf69..729002405a55 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1874,8 +1874,12 @@ config NETDEV_NOTIFIER_ERROR_INJECT
 	  If unsure, say N.
 
 config FUNCTION_ERROR_INJECTION
+        bool
+
+config FUNC_ERROR_INJECTION
 	bool "Fault-injections of functions"
 	depends on HAVE_FUNCTION_ERROR_INJECTION && KPROBES
+	select FUNCTION_ERROR_INJECTION
 	help
 	  Add fault injections into various functions that are annotated with
 	  ALLOW_ERROR_INJECTION() in the kernel. BPF may also modify the return
@@ -1883,6 +1887,17 @@ config FUNCTION_ERROR_INJECTION
 
 	  If unsure, say N
 
+config SYSCALL_ERROR_INJECTION
+	bool "Error injections in syscalls"
+	depends on HAVE_FUNCTION_ERROR_INJECTION && KPROBES
+	select FUNCTION_ERROR_INJECTION
+	default y
+	help
+	  Allows error injection framework to return errors from syscalls.
+	  BPF may modify return values of syscalls as well.
+
+	  If unsure, say Y
+
 config FAULT_INJECTION
 	bool "Fault-injection framework"
 	depends on DEBUG_KERNEL
diff --git a/lib/error-inject.c b/lib/error-inject.c
index 1afca1b1cdea..9ba868eb8c43 100644
--- a/lib/error-inject.c
+++ b/lib/error-inject.c
@@ -71,6 +71,10 @@ static void populate_error_injection_list(struct error_injection_entry *start,
 
 	mutex_lock(&ei_mutex);
 	for (iter = start; iter < end; iter++) {
+		if (iter->etype != EI_ETYPE_SYSCALL &&
+		    !IS_ENABLED(CONFIG_FUNC_ERROR_INJECTION))
+			continue;
+
 		entry = (unsigned long)dereference_symbol_descriptor((void *)iter->addr);
 
 		if (!kernel_text_address(entry) ||
@@ -189,6 +193,8 @@ static const char *error_type_string(int etype)
 		return "ERRNO_NULL";
 	case EI_ETYPE_TRUE:
 		return "TRUE";
+	case EI_ETYPE_SYSCALL:
+		return "SYSCALL";
 	default:
 		return "(unknown)";
 	}
  
Steven Rostedt Dec. 2, 2022, 11:17 p.m. UTC | #27
On Fri, 2 Dec 2022 13:27:11 -0800
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -1874,8 +1874,12 @@ config NETDEV_NOTIFIER_ERROR_INJECT
>  	  If unsure, say N.
>  
>  config FUNCTION_ERROR_INJECTION

Why not just call this "ERROR_INJECTION" having this be FUNCTION and the
one for functions be FUNC is confusing.

> +        bool
> +
> +config FUNC_ERROR_INJECTION
>  	bool "Fault-injections of functions"
>  	depends on HAVE_FUNCTION_ERROR_INJECTION && KPROBES
> +	select FUNCTION_ERROR_INJECTION
>  	help
>  	  Add fault injections into various functions that are annotated with
>  	  ALLOW_ERROR_INJECTION() in the kernel. BPF may also modify the return
> @@ -1883,6 +1887,17 @@ config FUNCTION_ERROR_INJECTION
>  
>  	  If unsure, say N
>  
> +config SYSCALL_ERROR_INJECTION
> +	bool "Error injections in syscalls"
> +	depends on HAVE_FUNCTION_ERROR_INJECTION && KPROBES
> +	select FUNCTION_ERROR_INJECTION
> +	default y

IIUC, Linus prefers everything to be "default n" unless there's a really
good reason for it. Like only making other options available, but not doing
anything to the kernel. I do have DYNAMIC_FTRACE as "default y" but that's
only because it depends on CONFIG_FUNCTION_TRACER and nobody that enables
that should have DYNAMIC_FTRACE off (except for academia).

> +	help
> +	  Allows error injection framework to return errors from syscalls.
> +	  BPF may modify return values of syscalls as well.

And here's the thing. If BPF returns anything *but* an error, then this is
a misnomer and incorrect. Name it something else like "HIJACK_SYSCALLS".

> +
> +	  If unsure, say Y

And I'm curious, why Y if unsure?

-- Steve

> +
>  config FAULT_INJECTION
>  	bool "Fault-injection framework"
>  	depends on DEBUG_KERNEL
> diff --git a/lib/error-inject.c b/lib/error-inject.c
> index 1afca1b1cdea..9ba868eb8c43 100644
> --- a/lib/error-inject.c
> +++ b/lib/error-inject.c
> @@ -71,6 +71,10 @@ static void populate_error_injection_list(struct error_injection_entry *start,
>  
>  	mutex_lock(&ei_mutex);
>  	for (iter = start; iter < end; iter++) {
> +		if (iter->etype != EI_ETYPE_SYSCALL &&
> +		    !IS_ENABLED(CONFIG_FUNC_ERROR_INJECTION))
> +			continue;
> +
>  		entry = (unsigned long)dereference_symbol_descriptor((void *)iter->addr);
>  
>  		if (!kernel_text_address(entry) ||
> @@ -189,6 +193,8 @@ static const char *error_type_string(int etype)
>  		return "ERRNO_NULL";
>  	case EI_ETYPE_TRUE:
>  		return "TRUE";
> +	case EI_ETYPE_SYSCALL:
> +		return "SYSCALL";
>  	default:
>  		return "(unknown)";
>  	}
  
Alexei Starovoitov Dec. 3, 2022, 12:55 a.m. UTC | #28
On Fri, Dec 02, 2022 at 06:17:24PM -0500, Steven Rostedt wrote:
> On Fri, 2 Dec 2022 13:27:11 -0800
> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> 
> > --- a/lib/Kconfig.debug
> > +++ b/lib/Kconfig.debug
> > @@ -1874,8 +1874,12 @@ config NETDEV_NOTIFIER_ERROR_INJECT
> >  	  If unsure, say N.
> >  
> >  config FUNCTION_ERROR_INJECTION
> 
> Why not just call this "ERROR_INJECTION" having this be FUNCTION and the
> one for functions be FUNC is confusing.

That's what I had initially, but it causes plenty of churn to arch/*/Makefile
and a bunch of .h-s.
Keeping it as FUNCTION_ERROR_INJECTION removes all that noise from the diff.

> > +        bool
> > +
> > +config FUNC_ERROR_INJECTION
> >  	bool "Fault-injections of functions"
> >  	depends on HAVE_FUNCTION_ERROR_INJECTION && KPROBES
> > +	select FUNCTION_ERROR_INJECTION
> >  	help
> >  	  Add fault injections into various functions that are annotated with
> >  	  ALLOW_ERROR_INJECTION() in the kernel. BPF may also modify the return
> > @@ -1883,6 +1887,17 @@ config FUNCTION_ERROR_INJECTION
> >  
> >  	  If unsure, say N
> >  
> > +config SYSCALL_ERROR_INJECTION
> > +	bool "Error injections in syscalls"
> > +	depends on HAVE_FUNCTION_ERROR_INJECTION && KPROBES
> > +	select FUNCTION_ERROR_INJECTION
> > +	default y
> 
> IIUC, Linus prefers everything to be "default n" unless there's a really
> good reason for it. Like only making other options available, but not doing
> anything to the kernel. I do have DYNAMIC_FTRACE as "default y" but that's
> only because it depends on CONFIG_FUNCTION_TRACER and nobody that enables
> that should have DYNAMIC_FTRACE off (except for academia).

The FUNCTION_ERROR_INJECTION used to be "def_bool y" for ~5 years.
BTW the macro was called BPF_ALLOW_ERROR_INJECTION() when Josef initially implemented it.
Massami later renamed it ALLOW_ERROR_INJECTION() and allowed kprobes to use it.
Today there is a user expectation that this feature is available in the kernel.
We can do "default n" here, let distros decide and potentially upset users.
I don't feel strongly about that.

> 
> > +	help
> > +	  Allows error injection framework to return errors from syscalls.
> > +	  BPF may modify return values of syscalls as well.
> 
> And here's the thing. If BPF returns anything *but* an error, then this is
> a misnomer and incorrect. Name it something else like "HIJACK_SYSCALLS".

The bpf prog must return errno. No doubt about that.
Today the verifier validates return values whenever is necessary.
When original bpf_override_return was added the verifier wasn't that smart.
Since then we added return value checks pretty much everywhere.
Looks like the check is still missing bpf_override_return.
We will fix it asap.

> > +
> > +	  If unsure, say Y
> 
> And I'm curious, why Y if unsure?

Copy-paste. I can remove that line.
  
Masami Hiramatsu (Google) Dec. 4, 2022, 10:50 p.m. UTC | #29
On Fri, 2 Dec 2022 13:27:11 -0800
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> On Fri, Dec 02, 2022 at 10:56:52AM -0500, Theodore Ts'o wrote:
> > On Thu, Dec 01, 2022 at 05:41:29PM -0800, Alexei Starovoitov wrote:
> > > 
> > > The fault injection framework disables individual syscall with zero performance
> > > overhead comparing to LSM and seccomp mechanisms.
> > > BPF is not involved here. It's a kprobe in one spot.
> > > All other syscalls don't notice it.
> > > It's an attractive way to improve security.
> > > 
> > > A BPF prog over syscall can filter by user, cgroup, task and give fine grain
> > > control over security surface.
> > > tbh I'm not aware of folks doing "syscall disabling" through command line like
> > > above (I've only seen it through bpf), but it doesn't mean that somebody will
> > > not start complaining that their script broke, because distro disabled fault
> > > injection.
> > > 
> > > So should we split FUNCTION_ERROR_INJECTION kconfig into two ?
> > > And do default N for things like should_failslab() and
> > > default Y for syscalls?
> > 
> > How about calling the latter something like bpf syscall hooks, and not
> > using the terminology "error injection" in relation to system calls?
> > I think that might be less confusing.
> 
> I think 'syscall error injection' name fits well.
> It's a generic feature that both kprobes and bpf should be able to use.
> Here is the patch...
> 
> Even with this patch we have 7 failures in BPF selftests.
> We will fix them later with the same mechanism as we will pick for hid-bpf.
> 
> This patch will keep 'syscall disabling' scripts working
> and bpf syscall adjustment will work too.
> So no chance of breaking anyone.
> While actual error injection inside the kernel will be disabled.
> 
> Better name suggestions are welcome, of course.
> 
> From 2960958f91d1134b1a8f27787875f6b9300f205e Mon Sep 17 00:00:00 2001
> From: Alexei Starovoitov <ast@kernel.org>
> Date: Fri, 2 Dec 2022 13:06:08 -0800
> Subject: [PATCH] error-injection: Split FUNCTION_ERROR_INJECTION into syscalls
>  and the rest.
> 
> Split FUNCTION_ERROR_INJECTION into:
> - SYSCALL_ERROR_INJECTION with default y
> - FUNC_ERROR_INJECTION with default n.

OK, syscall is a bit different, it is clearly the boundary of the
functionality, so this seems safe.
IMHO, it is better to extend seccomp framework for testing.

> 
> The former is only used to modify return values of syscalls for security and
> user space testing reasons while the latter is for the rest of error injection
> in the kernel that should only be used to stress test and debug the kernel.
> 
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---
>  arch/arm64/include/asm/syscall_wrapper.h   |  8 ++++----
>  arch/powerpc/include/asm/syscall_wrapper.h |  4 ++--
>  arch/s390/include/asm/syscall_wrapper.h    | 12 ++++++------
>  arch/x86/include/asm/syscall_wrapper.h     |  4 ++--
>  include/asm-generic/error-injection.h      |  1 +
>  include/linux/compat.h                     |  4 ++--
>  include/linux/syscalls.h                   |  4 ++--
>  kernel/fail_function.c                     |  1 +
>  lib/Kconfig.debug                          | 15 +++++++++++++++
>  lib/error-inject.c                         |  6 ++++++
>  10 files changed, 41 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/syscall_wrapper.h b/arch/arm64/include/asm/syscall_wrapper.h
> index d30217c21eff..2c5ca239e88c 100644
> --- a/arch/arm64/include/asm/syscall_wrapper.h
> +++ b/arch/arm64/include/asm/syscall_wrapper.h
> @@ -19,7 +19,7 @@
>  
>  #define COMPAT_SYSCALL_DEFINEx(x, name, ...)						\
>  	asmlinkage long __arm64_compat_sys##name(const struct pt_regs *regs);		\
> -	ALLOW_ERROR_INJECTION(__arm64_compat_sys##name, ERRNO);				\
> +	ALLOW_ERROR_INJECTION(__arm64_compat_sys##name, SYSCALL);				\

But in that case, please don't use ALLOW_ERROR_INJECTION. I don't want to
mix up the function-level error injection(FEI) and syscall error injection.

For this reason, I want to decouple it from the FEI. FEI will be used
for more kernel internal functions under development (or debugging),
which can break something because it will forcibly change the code
behavior and the kernel will be in unexpected state.

Thank you,

>  	static long __se_compat_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__));		\
>  	static inline long __do_compat_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__));	\
>  	asmlinkage long __arm64_compat_sys##name(const struct pt_regs *regs)		\
> @@ -34,7 +34,7 @@
>  
>  #define COMPAT_SYSCALL_DEFINE0(sname)							\
>  	asmlinkage long __arm64_compat_sys_##sname(const struct pt_regs *__unused);	\
> -	ALLOW_ERROR_INJECTION(__arm64_compat_sys_##sname, ERRNO);			\
> +	ALLOW_ERROR_INJECTION(__arm64_compat_sys_##sname, SYSCALL);			\
>  	asmlinkage long __arm64_compat_sys_##sname(const struct pt_regs *__unused)
>  
>  #define COND_SYSCALL_COMPAT(name) 							\
> @@ -50,7 +50,7 @@
>  
>  #define __SYSCALL_DEFINEx(x, name, ...)						\
>  	asmlinkage long __arm64_sys##name(const struct pt_regs *regs);		\
> -	ALLOW_ERROR_INJECTION(__arm64_sys##name, ERRNO);			\
> +	ALLOW_ERROR_INJECTION(__arm64_sys##name, SYSCALL);			\
>  	static long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__));		\
>  	static inline long __do_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__));	\
>  	asmlinkage long __arm64_sys##name(const struct pt_regs *regs)		\
> @@ -69,7 +69,7 @@
>  #define SYSCALL_DEFINE0(sname)							\
>  	SYSCALL_METADATA(_##sname, 0);						\
>  	asmlinkage long __arm64_sys_##sname(const struct pt_regs *__unused);	\
> -	ALLOW_ERROR_INJECTION(__arm64_sys_##sname, ERRNO);			\
> +	ALLOW_ERROR_INJECTION(__arm64_sys_##sname, SYSCALL);			\
>  	asmlinkage long __arm64_sys_##sname(const struct pt_regs *__unused)
>  
>  #define COND_SYSCALL(name)							\
> diff --git a/arch/powerpc/include/asm/syscall_wrapper.h b/arch/powerpc/include/asm/syscall_wrapper.h
> index 67486c67e8a2..ce1148809c6b 100644
> --- a/arch/powerpc/include/asm/syscall_wrapper.h
> +++ b/arch/powerpc/include/asm/syscall_wrapper.h
> @@ -17,7 +17,7 @@ struct pt_regs;
>  
>  #define __SYSCALL_DEFINEx(x, name, ...)						\
>  	long sys##name(const struct pt_regs *regs);			\
> -	ALLOW_ERROR_INJECTION(sys##name, ERRNO);			\
> +	ALLOW_ERROR_INJECTION(sys##name, SYSCALL);			\
>  	static long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__));		\
>  	static inline long __do_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__));	\
>  	long sys##name(const struct pt_regs *regs)			\
> @@ -36,7 +36,7 @@ struct pt_regs;
>  #define SYSCALL_DEFINE0(sname)							\
>  	SYSCALL_METADATA(_##sname, 0);						\
>  	long sys_##sname(const struct pt_regs *__unused);		\
> -	ALLOW_ERROR_INJECTION(sys_##sname, ERRNO);			\
> +	ALLOW_ERROR_INJECTION(sys_##sname, SYSCALL);			\
>  	long sys_##sname(const struct pt_regs *__unused)
>  
>  #define COND_SYSCALL(name)							\
> diff --git a/arch/s390/include/asm/syscall_wrapper.h b/arch/s390/include/asm/syscall_wrapper.h
> index fde7e6b1df48..a253def48cbe 100644
> --- a/arch/s390/include/asm/syscall_wrapper.h
> +++ b/arch/s390/include/asm/syscall_wrapper.h
> @@ -58,7 +58,7 @@
>  
>  #define __S390_SYS_STUBx(x, name, ...)						\
>  	long __s390_sys##name(struct pt_regs *regs);				\
> -	ALLOW_ERROR_INJECTION(__s390_sys##name, ERRNO);				\
> +	ALLOW_ERROR_INJECTION(__s390_sys##name, SYSCALL);				\
>  	long __s390_sys##name(struct pt_regs *regs)				\
>  	{									\
>  		long ret = __do_sys##name(SYSCALL_PT_ARGS(x, regs,		\
> @@ -74,13 +74,13 @@
>  #define COMPAT_SYSCALL_DEFINE0(sname)					\
>  	SYSCALL_METADATA(_##sname, 0);					\
>  	long __s390_compat_sys_##sname(void);				\
> -	ALLOW_ERROR_INJECTION(__s390_compat_sys_##sname, ERRNO);	\
> +	ALLOW_ERROR_INJECTION(__s390_compat_sys_##sname, SYSCALL);	\
>  	long __s390_compat_sys_##sname(void)
>  
>  #define SYSCALL_DEFINE0(sname)						\
>  	SYSCALL_METADATA(_##sname, 0);					\
>  	long __s390x_sys_##sname(void);					\
> -	ALLOW_ERROR_INJECTION(__s390x_sys_##sname, ERRNO);		\
> +	ALLOW_ERROR_INJECTION(__s390x_sys_##sname, SYSCALL);		\
>  	long __s390_sys_##sname(void)					\
>  		__attribute__((alias(__stringify(__s390x_sys_##sname)))); \
>  	long __s390x_sys_##sname(void)
> @@ -100,7 +100,7 @@
>  	long __s390_compat_sys##name(struct pt_regs *regs);				\
>  	long __s390_compat_sys##name(struct pt_regs *regs)				\
>  		__attribute__((alias(__stringify(__se_compat_sys##name))));		\
> -	ALLOW_ERROR_INJECTION(__s390_compat_sys##name, ERRNO);				\
> +	ALLOW_ERROR_INJECTION(__s390_compat_sys##name, SYSCALL);				\
>  	static inline long __do_compat_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__));	\
>  	long __se_compat_sys##name(struct pt_regs *regs);				\
>  	long __se_compat_sys##name(struct pt_regs *regs)				\
> @@ -131,7 +131,7 @@
>  #define SYSCALL_DEFINE0(sname)						\
>  	SYSCALL_METADATA(_##sname, 0);					\
>  	long __s390x_sys_##sname(void);					\
> -	ALLOW_ERROR_INJECTION(__s390x_sys_##sname, ERRNO);		\
> +	ALLOW_ERROR_INJECTION(__s390x_sys_##sname, SYSCALL);		\
>  	long __s390x_sys_##sname(void)
>  
>  #define COND_SYSCALL(name)						\
> @@ -148,7 +148,7 @@
>  		      "Type aliasing is used to sanitize syscall arguments");		\
>  	long __s390x_sys##name(struct pt_regs *regs)					\
>  		__attribute__((alias(__stringify(__se_sys##name))));			\
> -	ALLOW_ERROR_INJECTION(__s390x_sys##name, ERRNO);				\
> +	ALLOW_ERROR_INJECTION(__s390x_sys##name, SYSCALL);				\
>  	static inline long __do_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__));		\
>  	long __se_sys##name(struct pt_regs *regs);					\
>  	__S390_SYS_STUBx(x, name, __VA_ARGS__)						\
> diff --git a/arch/x86/include/asm/syscall_wrapper.h b/arch/x86/include/asm/syscall_wrapper.h
> index fd2669b1cb2d..ca0cd8fa1866 100644
> --- a/arch/x86/include/asm/syscall_wrapper.h
> +++ b/arch/x86/include/asm/syscall_wrapper.h
> @@ -67,13 +67,13 @@ extern long __ia32_sys_ni_syscall(const struct pt_regs *regs);
>  
>  #define __SYS_STUB0(abi, name)						\
>  	long __##abi##_##name(const struct pt_regs *regs);		\
> -	ALLOW_ERROR_INJECTION(__##abi##_##name, ERRNO);			\
> +	ALLOW_ERROR_INJECTION(__##abi##_##name, SYSCALL);			\
>  	long __##abi##_##name(const struct pt_regs *regs)		\
>  		__alias(__do_##name);
>  
>  #define __SYS_STUBx(abi, name, ...)					\
>  	long __##abi##_##name(const struct pt_regs *regs);		\
> -	ALLOW_ERROR_INJECTION(__##abi##_##name, ERRNO);			\
> +	ALLOW_ERROR_INJECTION(__##abi##_##name, SYSCALL);			\
>  	long __##abi##_##name(const struct pt_regs *regs)		\
>  	{								\
>  		return __se_##name(__VA_ARGS__);			\
> diff --git a/include/asm-generic/error-injection.h b/include/asm-generic/error-injection.h
> index fbca56bd9cbc..c4fb52f5b789 100644
> --- a/include/asm-generic/error-injection.h
> +++ b/include/asm-generic/error-injection.h
> @@ -9,6 +9,7 @@ enum {
>  	EI_ETYPE_ERRNO,		/* Return -ERRNO if failure */
>  	EI_ETYPE_ERRNO_NULL,	/* Return -ERRNO or NULL if failure */
>  	EI_ETYPE_TRUE,		/* Return true if failure */
> +	EI_ETYPE_SYSCALL,	/* Return -ERRNO out of syscall */
>  };
>  
>  struct error_injection_entry {
> diff --git a/include/linux/compat.h b/include/linux/compat.h
> index 594357881b0b..21d2fd48f7e2 100644
> --- a/include/linux/compat.h
> +++ b/include/linux/compat.h
> @@ -45,7 +45,7 @@
>  #ifndef COMPAT_SYSCALL_DEFINE0
>  #define COMPAT_SYSCALL_DEFINE0(name) \
>  	asmlinkage long compat_sys_##name(void); \
> -	ALLOW_ERROR_INJECTION(compat_sys_##name, ERRNO); \
> +	ALLOW_ERROR_INJECTION(compat_sys_##name, SYSCALL); \
>  	asmlinkage long compat_sys_##name(void)
>  #endif /* COMPAT_SYSCALL_DEFINE0 */
>  
> @@ -74,7 +74,7 @@
>  		      "Type aliasing is used to sanitize syscall arguments");\
>  	asmlinkage long compat_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__))	\
>  		__attribute__((alias(__stringify(__se_compat_sys##name))));	\
> -	ALLOW_ERROR_INJECTION(compat_sys##name, ERRNO);				\
> +	ALLOW_ERROR_INJECTION(compat_sys##name, SYSCALL);				\
>  	static inline long __do_compat_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__));\
>  	asmlinkage long __se_compat_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__));	\
>  	asmlinkage long __se_compat_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__))	\
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index a34b0f9a9972..05fc3a0575c0 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -210,7 +210,7 @@ static inline int is_syscall_trace_event(struct trace_event_call *tp_event)
>  #define SYSCALL_DEFINE0(sname)					\
>  	SYSCALL_METADATA(_##sname, 0);				\
>  	asmlinkage long sys_##sname(void);			\
> -	ALLOW_ERROR_INJECTION(sys_##sname, ERRNO);		\
> +	ALLOW_ERROR_INJECTION(sys_##sname, SYSCALL);		\
>  	asmlinkage long sys_##sname(void)
>  #endif /* SYSCALL_DEFINE0 */
>  
> @@ -241,7 +241,7 @@ static inline int is_syscall_trace_event(struct trace_event_call *tp_event)
>  		      "Type aliasing is used to sanitize syscall arguments");\
>  	asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__))	\
>  		__attribute__((alias(__stringify(__se_sys##name))));	\
> -	ALLOW_ERROR_INJECTION(sys##name, ERRNO);			\
> +	ALLOW_ERROR_INJECTION(sys##name, SYSCALL);			\
>  	static inline long __do_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__));\
>  	asmlinkage long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__));	\
>  	asmlinkage long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__))	\
> diff --git a/kernel/fail_function.c b/kernel/fail_function.c
> index a7ccd2930c5f..65d3f5db5f3a 100644
> --- a/kernel/fail_function.c
> +++ b/kernel/fail_function.c
> @@ -38,6 +38,7 @@ static unsigned long adjust_error_retval(unsigned long addr, unsigned long retv)
>  	switch (get_injectable_error_type(addr)) {
>  	case EI_ETYPE_NULL:
>  		return 0;
> +	case EI_ETYPE_SYSCALL:
>  	case EI_ETYPE_ERRNO:
>  		if (retv < (unsigned long)-MAX_ERRNO)
>  			return (unsigned long)-EINVAL;
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 38545c56bf69..729002405a55 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -1874,8 +1874,12 @@ config NETDEV_NOTIFIER_ERROR_INJECT
>  	  If unsure, say N.
>  
>  config FUNCTION_ERROR_INJECTION
> +        bool
> +
> +config FUNC_ERROR_INJECTION
>  	bool "Fault-injections of functions"
>  	depends on HAVE_FUNCTION_ERROR_INJECTION && KPROBES
> +	select FUNCTION_ERROR_INJECTION
>  	help
>  	  Add fault injections into various functions that are annotated with
>  	  ALLOW_ERROR_INJECTION() in the kernel. BPF may also modify the return
> @@ -1883,6 +1887,17 @@ config FUNCTION_ERROR_INJECTION
>  
>  	  If unsure, say N
>  
> +config SYSCALL_ERROR_INJECTION
> +	bool "Error injections in syscalls"
> +	depends on HAVE_FUNCTION_ERROR_INJECTION && KPROBES
> +	select FUNCTION_ERROR_INJECTION
> +	default y
> +	help
> +	  Allows error injection framework to return errors from syscalls.
> +	  BPF may modify return values of syscalls as well.
> +
> +	  If unsure, say Y
> +
>  config FAULT_INJECTION
>  	bool "Fault-injection framework"
>  	depends on DEBUG_KERNEL
> diff --git a/lib/error-inject.c b/lib/error-inject.c
> index 1afca1b1cdea..9ba868eb8c43 100644
> --- a/lib/error-inject.c
> +++ b/lib/error-inject.c
> @@ -71,6 +71,10 @@ static void populate_error_injection_list(struct error_injection_entry *start,
>  
>  	mutex_lock(&ei_mutex);
>  	for (iter = start; iter < end; iter++) {
> +		if (iter->etype != EI_ETYPE_SYSCALL &&
> +		    !IS_ENABLED(CONFIG_FUNC_ERROR_INJECTION))
> +			continue;
> +
>  		entry = (unsigned long)dereference_symbol_descriptor((void *)iter->addr);
>  
>  		if (!kernel_text_address(entry) ||
> @@ -189,6 +193,8 @@ static const char *error_type_string(int etype)
>  		return "ERRNO_NULL";
>  	case EI_ETYPE_TRUE:
>  		return "TRUE";
> +	case EI_ETYPE_SYSCALL:
> +		return "SYSCALL";
>  	default:
>  		return "(unknown)";
>  	}
> -- 
> 2.30.2
>
  
Benjamin Tissoires Dec. 5, 2022, 5:01 p.m. UTC | #30
On Dec 02 2022, Alexei Starovoitov wrote:
> On Fri, Dec 02, 2022 at 03:55:38PM +0100, Benjamin Tissoires wrote:
> > 
> > 
> > On 12/1/22 22:13, Linus Torvalds wrote:
> > > On Thu, Dec 1, 2022 at 8:59 AM Alexei Starovoitov
> > > <alexei.starovoitov@gmail.com> wrote:
> > > > 
> > > > The hid-bpf framework depends on it.
> > > 
> > > Ok, this is completely unacceptably disgusting hack.
> > 
> > [foreword: I have read the other replies, just replying to this one
> > because it is the explicit ask for a fix]
> > 
> > > 
> > > That needs fixing.
> > > 
> > > > Either hid-bpf or bpf core can add
> > > > "depends on FUNCTION_ERROR_INJECTION"
> > > 
> > > No, it needs to be narrowed down a lot. Nobody sane wants error
> > > injection just because they want some random HID thing.
> > > 
> > > And no, BPF shouldn't need it either.
> > > 
> > > This needs to be narrowed down to the point where HID can say "I want
> > > *this* particular call to be able to be a bpf call.
> > 
> > So, would the following be OK? I still have a few concerns I'll explain
> > after the patch.
> > 
[...]
> > 
> > While this patch removes the need for ALLOW_ERROR_INJECTION it has a
> > couple of drawbacks:
> > - suddenly we lose the nice separation of concerns between bpf core and
> > its users (HID in my case)
> > - it would need to be changed in 6.3 simply because of the previous
> > point, so it is just a temporary fix.

And third, it was bogus because the check_attach_modify_return() test was
inverted.

> 
> Agree, but it works short term.
> A silver lining is BTF_SET approach consumes 4 bytes per mark
> while ALLOW_ERROR_INJECTION consumes 16 bytes for struct error_injection_entry
> and then another 48 bytes per mark for struct ei_entry.
> 
> An alternative would be to define a known prefix like "bpf_modret_"
> or "bpf_hook_" and rename these three functions.

Not a big fan of that idea, sorry. It would work, for sure, but I don't
want to prefix my subsystem API by "bpf_modret_" which would make it
look like it is not part of my subsystem.

> 
> Then there will be no need for #include <linux/hid_bpf.h> in bpf core.

So I took your other advice to look into register_btf_kfunc_id_set().
And given that the internals of kfuncs are no more than a binary list of
ids, we can easily adapt it to have a better API to declare which
functions are fmodret. See [1] for the new series.

The net benefit are that now we can declare those functions outside of
any ALLOW_ERROR_INJECTION, outside of bpf-core, and also we can tag
sleepable ones which was not the case previously.

So I am rather happy with that v2. I'm sure there will be bikeshedding,
but this one looks better than my previous patch here.

> 
> > So I am not sure if this would qualify HID-BPF for 6.2. Please speak up.
> 
> Since that was the only thing I think it's fine to stay in the queue.

My plan is to see if we can get this validated in the next 2 days and if
not, drop it from 6.2 and reintroduce it in 6.3. After the weekend I
realized that delaying HID_BPF for one more release was not too much of an
issue in the end.

Cheers,
Benjamin


[1] https://lore.kernel.org/all/20221205164856.705656-2-benjamin.tissoires@redhat.com/
  
Alexei Starovoitov Dec. 6, 2022, 2:05 a.m. UTC | #31
On Mon, Dec 05, 2022 at 07:50:15AM +0900, Masami Hiramatsu wrote:
> On Fri, 2 Dec 2022 13:27:11 -0800
> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> 
> > On Fri, Dec 02, 2022 at 10:56:52AM -0500, Theodore Ts'o wrote:
> > > On Thu, Dec 01, 2022 at 05:41:29PM -0800, Alexei Starovoitov wrote:
> > > > 
> > > > The fault injection framework disables individual syscall with zero performance
> > > > overhead comparing to LSM and seccomp mechanisms.
> > > > BPF is not involved here. It's a kprobe in one spot.
> > > > All other syscalls don't notice it.
> > > > It's an attractive way to improve security.
> > > > 
> > > > A BPF prog over syscall can filter by user, cgroup, task and give fine grain
> > > > control over security surface.
> > > > tbh I'm not aware of folks doing "syscall disabling" through command line like
> > > > above (I've only seen it through bpf), but it doesn't mean that somebody will
> > > > not start complaining that their script broke, because distro disabled fault
> > > > injection.
> > > > 
> > > > So should we split FUNCTION_ERROR_INJECTION kconfig into two ?
> > > > And do default N for things like should_failslab() and
> > > > default Y for syscalls?
> > > 
> > > How about calling the latter something like bpf syscall hooks, and not
> > > using the terminology "error injection" in relation to system calls?
> > > I think that might be less confusing.
> > 
> > I think 'syscall error injection' name fits well.
> > It's a generic feature that both kprobes and bpf should be able to use.
> > Here is the patch...
> > 
> > Even with this patch we have 7 failures in BPF selftests.
> > We will fix them later with the same mechanism as we will pick for hid-bpf.
> > 
> > This patch will keep 'syscall disabling' scripts working
> > and bpf syscall adjustment will work too.
> > So no chance of breaking anyone.
> > While actual error injection inside the kernel will be disabled.
> > 
> > Better name suggestions are welcome, of course.
> > 
> > From 2960958f91d1134b1a8f27787875f6b9300f205e Mon Sep 17 00:00:00 2001
> > From: Alexei Starovoitov <ast@kernel.org>
> > Date: Fri, 2 Dec 2022 13:06:08 -0800
> > Subject: [PATCH] error-injection: Split FUNCTION_ERROR_INJECTION into syscalls
> >  and the rest.
> > 
> > Split FUNCTION_ERROR_INJECTION into:
> > - SYSCALL_ERROR_INJECTION with default y
> > - FUNC_ERROR_INJECTION with default n.
> 
> OK, syscall is a bit different, it is clearly the boundary of the
> functionality, so this seems safe.
> IMHO, it is better to extend seccomp framework for testing.

seccomp doesn't support eBPF

> > 
> > The former is only used to modify return values of syscalls for security and
> > user space testing reasons while the latter is for the rest of error injection
> > in the kernel that should only be used to stress test and debug the kernel.
> > 
> > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> > ---
> >  arch/arm64/include/asm/syscall_wrapper.h   |  8 ++++----
> >  arch/powerpc/include/asm/syscall_wrapper.h |  4 ++--
> >  arch/s390/include/asm/syscall_wrapper.h    | 12 ++++++------
> >  arch/x86/include/asm/syscall_wrapper.h     |  4 ++--
> >  include/asm-generic/error-injection.h      |  1 +
> >  include/linux/compat.h                     |  4 ++--
> >  include/linux/syscalls.h                   |  4 ++--
> >  kernel/fail_function.c                     |  1 +
> >  lib/Kconfig.debug                          | 15 +++++++++++++++
> >  lib/error-inject.c                         |  6 ++++++
> >  10 files changed, 41 insertions(+), 18 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/syscall_wrapper.h b/arch/arm64/include/asm/syscall_wrapper.h
> > index d30217c21eff..2c5ca239e88c 100644
> > --- a/arch/arm64/include/asm/syscall_wrapper.h
> > +++ b/arch/arm64/include/asm/syscall_wrapper.h
> > @@ -19,7 +19,7 @@
> >  
> >  #define COMPAT_SYSCALL_DEFINEx(x, name, ...)						\
> >  	asmlinkage long __arm64_compat_sys##name(const struct pt_regs *regs);		\
> > -	ALLOW_ERROR_INJECTION(__arm64_compat_sys##name, ERRNO);				\
> > +	ALLOW_ERROR_INJECTION(__arm64_compat_sys##name, SYSCALL);				\
> 
> But in that case, please don't use ALLOW_ERROR_INJECTION. I don't want to
> mix up the function-level error injection(FEI) and syscall error injection.

Are you suggesting to copy-paste ALLOW_ERROR_INJECTION logic into another
special section, another vmlinux.lds.h hack, copy-paste of lib/error-inject.c ?
Only to have a different name? Sorry, but that makes no sense.
syscalls return errno towards user space,
while the rest of 'error inject' funcs return errno towards the kernel.
Both are quite similar. There is no need to duplicate:
debugfs_create_dir("error_injection", ...
fault_create_debugfs_attr("fail_function", ...

> For this reason, I want to decouple it from the FEI. FEI will be used
> for more kernel internal functions under development (or debugging),
> which can break something because it will forcibly change the code
> behavior and the kernel will be in unexpected state.

There is no 'unexpected state'.
When Josef added BPF_ALLOW_ERROR_INJECTION() in include/linux/bpf.h
we marked several functions in fs/btrfs/ this way.
Later more functions were marked.
The callers of all those functions have to be ready to deal with errors.
If any of the currently marked functions can oops the kernel it's a bug
in the caller and it has to be fixed, because normal execution can
sooner or later return similar error.
Consider ALLOW_ERROR_INJECTION(should_failslab, ERRNO);
That function was specifically added to exercise memory allocation errors.
The bpf error injection mechanism is not the only one that can generate
the errors.

Later you renamed BPF_ALLOW_ERROR_INJECTION into ALLOW_ERROR_INJECTION in
commit 540adea3809f ("error-injection: Separate error-injection from kprobe"),
but the main purpose of "bpf error injection" stayed the same.
We didn't mark random kernel functions as 'inject errors here'.
Only those whose callers must do sane things in case of errors.

So attemp to 'will be used for more kernel internal functions under development'
doesn't fit the spirit for bpf error injection as it is today.
For this kind of random kernel injection please use some other mechanism.
We cannot allow bpf to change return values of random function.

As far as users of this [BPF_]ALLOW_ERROR_INJECTION...
I couldn't find any blog, article or post that is talking about
text interface to tweak return values /sys/kernel/debug/fail_function.
Only links to kernel doc.
But there are plenty of BPF users of error injection. Like:
https://github.com/iovisor/bcc/blob/master/tools/inject_example.txt
https://chaos-mesh.org/docs/simulate-kernel-chaos-on-kubernetes/
https://github.com/iovisor/bpftrace/blob/master/docs/reference_guide.md#20-override-override-return-value
so we should tailor this 'error injection' facility to actual users
and not hypothetical 'more kernel internal functions under development'.
  

Patch

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index c3c0b077ade3..9ee72d8860c3 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1874,8 +1874,14 @@  config NETDEV_NOTIFIER_ERROR_INJECT
 	  If unsure, say N.
 
 config FUNCTION_ERROR_INJECTION
-	def_bool y
+	bool "Fault-injections of functions"
 	depends on HAVE_FUNCTION_ERROR_INJECTION && KPROBES
+	help
+	  Add fault injections into various functions that are annotated with
+	  ALLOW_ERROR_INJECTION() in the kernel. BPF may also modify the return
+	  value of theses functions. This is useful to test error paths of code.
+
+	  If unsure, say N
 
 config FAULT_INJECTION
 	bool "Fault-injection framework"