[v2] panic: Taint kernel if fault injection has been used

Message ID 167019256481.3792653.4369637751468386073.stgit@devnote3
State New
Headers
Series [v2] panic: Taint kernel if fault injection has been used |

Commit Message

Masami Hiramatsu (Google) Dec. 4, 2022, 10:22 p.m. UTC
  From: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Since the function error injection framework in the fault injection
subsystem can change the function code flow forcibly, it may cause
unexpected behavior (and that is the purpose of this feature) even
if it is applied to the ALLOW_ERROR_INJECTION functions.
So this feature must be used only for debugging or testing purpose.

To identify this in the kernel oops message, add a new taint flag
for the fault injection. This taint flag will be set by either
function error injection is used or the BPF use the kprobe_override
on error injectable functions (identified by ALLOW_ERROR_INJECTION).

Link: https://lore.kernel.org/all/20221121104403.1545f9b5@gandalf.local.home/T/#u

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 Changes in v2:
  - Update kernel-chktaint.
---
 Documentation/admin-guide/tainted-kernels.rst |    5 +++++
 include/linux/panic.h                         |    3 ++-
 kernel/fail_function.c                        |    2 ++
 kernel/panic.c                                |    1 +
 kernel/trace/bpf_trace.c                      |    2 ++
 tools/debugging/kernel-chktaint               |    8 ++++++++
 6 files changed, 20 insertions(+), 1 deletion(-)
  

Comments

Alexei Starovoitov Dec. 4, 2022, 10:30 p.m. UTC | #1
On Mon, Dec 05, 2022 at 07:22:44AM +0900, Masami Hiramatsu (Google) wrote:
> From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> 
> Since the function error injection framework in the fault injection
> subsystem can change the function code flow forcibly, it may cause
> unexpected behavior (and that is the purpose of this feature) even
> if it is applied to the ALLOW_ERROR_INJECTION functions.
> So this feature must be used only for debugging or testing purpose.

The whole idea of tainting for kernel debugging is questionable.
There are many other *inject* kconfigs and other debug flags
for link lists, RCU, sleeping, etc.
None of them taint the kernel.

> To identify this in the kernel oops message, add a new taint flag

Have you ever seen a single oops message because of this particular
error injection?

> for the fault injection. This taint flag will be set by either
> function error injection is used or the BPF use the kprobe_override
> on error injectable functions (identified by ALLOW_ERROR_INJECTION).

...

>  	/* set the new array to event->tp_event and set event->prog */
> +	if (prog->kprobe_override)
> +		add_taint(TAINT_FAULT_INJECTED, LOCKDEP_NOW_UNRELIABLE);

Nack for bpf bits.
  
Masami Hiramatsu (Google) Dec. 4, 2022, 10:59 p.m. UTC | #2
On Sun, 4 Dec 2022 14:30:01 -0800
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> On Mon, Dec 05, 2022 at 07:22:44AM +0900, Masami Hiramatsu (Google) wrote:
> > From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> > 
> > Since the function error injection framework in the fault injection
> > subsystem can change the function code flow forcibly, it may cause
> > unexpected behavior (and that is the purpose of this feature) even
> > if it is applied to the ALLOW_ERROR_INJECTION functions.
> > So this feature must be used only for debugging or testing purpose.
> 
> The whole idea of tainting for kernel debugging is questionable.
> There are many other *inject* kconfigs and other debug flags
> for link lists, RCU, sleeping, etc.
> None of them taint the kernel.
> 
> > To identify this in the kernel oops message, add a new taint flag
> 
> Have you ever seen a single oops message because of this particular
> error injection?

No, but there is no guarantee that the FEI doesn't cause any issue
in the future too. If it happens, we need to know the precise
information about what FEI/bpf does.
FEI is a kind of temporal Livepatch for testing. If Livepatch taints
the kernel, why doesn't the FEI taint it too?

> 
> > for the fault injection. This taint flag will be set by either
> > function error injection is used or the BPF use the kprobe_override
> > on error injectable functions (identified by ALLOW_ERROR_INJECTION).
> 
> ...
> 
> >  	/* set the new array to event->tp_event and set event->prog */
> > +	if (prog->kprobe_override)
> > +		add_taint(TAINT_FAULT_INJECTED, LOCKDEP_NOW_UNRELIABLE);
> 
> Nack for bpf bits.

I think this is needed especially for bpf bits. If we see this flag,
we can ask reporters to share the bpf programs which they used.

Thank you,
  
Alexei Starovoitov Dec. 6, 2022, 2:17 a.m. UTC | #3
On Mon, Dec 05, 2022 at 07:59:21AM +0900, Masami Hiramatsu wrote:
> On Sun, 4 Dec 2022 14:30:01 -0800
> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> 
> > On Mon, Dec 05, 2022 at 07:22:44AM +0900, Masami Hiramatsu (Google) wrote:
> > > From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> > > 
> > > Since the function error injection framework in the fault injection
> > > subsystem can change the function code flow forcibly, it may cause
> > > unexpected behavior (and that is the purpose of this feature) even
> > > if it is applied to the ALLOW_ERROR_INJECTION functions.
> > > So this feature must be used only for debugging or testing purpose.
> > 
> > The whole idea of tainting for kernel debugging is questionable.
> > There are many other *inject* kconfigs and other debug flags
> > for link lists, RCU, sleeping, etc.
> > None of them taint the kernel.
> > 
> > > To identify this in the kernel oops message, add a new taint flag
> > 
> > Have you ever seen a single oops message because of this particular
> > error injection?
> 
> No, but there is no guarantee that the FEI doesn't cause any issue
> in the future too. If it happens, we need to know the precise
> information about what FEI/bpf does.
> FEI is a kind of temporal Livepatch for testing. If Livepatch taints
> the kernel, why doesn't the FEI taint it too?

Live patching can replace an arbitrary function and the kernel has
no visibility into what KLP module is doing.
While 'bpf error injection' is predictable.
The functions marked with [BPF_]ALLOW_ERROR_INJECTION can return errors
in the normal execution. So the callers of these functions have to deal with errors.

If kernel panics on such injected error it potentially would have paniced
on it anyway. At this point crash dump might be necessary to debug.
Whether oops happened because of bpf, kprobe or normal execution
doesn't matter much. The bug is in the caller that wasn't prepared
to deal with that error.

One can still walk all bpf progs from crash dump with tool "drgn"
(it has nice scripts to examine the dumps) or "crash" or other tools.

> > 
> > > for the fault injection. This taint flag will be set by either
> > > function error injection is used or the BPF use the kprobe_override
> > > on error injectable functions (identified by ALLOW_ERROR_INJECTION).
> > 
> > ...
> > 
> > >  	/* set the new array to event->tp_event and set event->prog */
> > > +	if (prog->kprobe_override)
> > > +		add_taint(TAINT_FAULT_INJECTED, LOCKDEP_NOW_UNRELIABLE);
> > 
> > Nack for bpf bits.
> 
> I think this is needed especially for bpf bits. If we see this flag,
> we can ask reporters to share the bpf programs which they used.

You can ask reporters to share bpf progs, but you can repro
the oops just as well without bpf. It's not bpf to blame, but the
bug in the caller that you should worry about.
  
Masami Hiramatsu (Google) Dec. 6, 2022, 7:20 a.m. UTC | #4
On Mon, 5 Dec 2022 18:17:00 -0800
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> On Mon, Dec 05, 2022 at 07:59:21AM +0900, Masami Hiramatsu wrote:
> > On Sun, 4 Dec 2022 14:30:01 -0800
> > Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> > 
> > > On Mon, Dec 05, 2022 at 07:22:44AM +0900, Masami Hiramatsu (Google) wrote:
> > > > From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> > > > 
> > > > Since the function error injection framework in the fault injection
> > > > subsystem can change the function code flow forcibly, it may cause
> > > > unexpected behavior (and that is the purpose of this feature) even
> > > > if it is applied to the ALLOW_ERROR_INJECTION functions.
> > > > So this feature must be used only for debugging or testing purpose.
> > > 
> > > The whole idea of tainting for kernel debugging is questionable.
> > > There are many other *inject* kconfigs and other debug flags
> > > for link lists, RCU, sleeping, etc.
> > > None of them taint the kernel.
> > > 
> > > > To identify this in the kernel oops message, add a new taint flag
> > > 
> > > Have you ever seen a single oops message because of this particular
> > > error injection?
> > 
> > No, but there is no guarantee that the FEI doesn't cause any issue
> > in the future too. If it happens, we need to know the precise
> > information about what FEI/bpf does.
> > FEI is a kind of temporal Livepatch for testing. If Livepatch taints
> > the kernel, why doesn't the FEI taint it too?
> 
> Live patching can replace an arbitrary function and the kernel has
> no visibility into what KLP module is doing.
> While 'bpf error injection' is predictable.

No, not much predictable because the kernel code can be changed.

> The functions marked with [BPF_]ALLOW_ERROR_INJECTION can return errors
> in the normal execution. So the callers of these functions have to deal with errors.

Right, but it might change something before checking the input, and
if it rejects the sane input, the caller may go into unexpected
status (e.g. the caller already checked input value, and does not
expect the call is fail). Such behaviors are buggy, yes. And the
FEI is designed for finding such buggy behavior.
(e.g. injecting error, but the caller passed successfully, it
means the caller code has some issue.)

> If kernel panics on such injected error it potentially would have paniced
> on it anyway.

Yes, but that doesn't cover all cases. If the function doesn't have
any internal state but returns an error according to the input,
FEI can make it return an error even if the input is correct.
And if it cause a kernel panic, that is a panic that must not
happen without FEI.

Thus, the ALLOW_ERROR_INJECTION should only be applied to the
function which has so-called 'side-effect', e.g. memory allocation,
external data (except for input data) read, etc. that could cause
an error regardless of the input value. Then the caller must
handle such errors.

> At this point crash dump might be necessary to debug.

Yes. So the TAINT flag can help. Please consider that the TAINT flag
doesn't mean you are guilty, but this is just a hint for debugging.
(good for the first triage)

> Whether oops happened because of bpf, kprobe or normal execution
> doesn't matter much. The bug is in the caller that wasn't prepared
> to deal with that error.
> 
> One can still walk all bpf progs from crash dump with tool "drgn"
> (it has nice scripts to examine the dumps) or "crash" or other tools.
> 
> > > 
> > > > for the fault injection. This taint flag will be set by either
> > > > function error injection is used or the BPF use the kprobe_override
> > > > on error injectable functions (identified by ALLOW_ERROR_INJECTION).
> > > 
> > > ...
> > > 
> > > >  	/* set the new array to event->tp_event and set event->prog */
> > > > +	if (prog->kprobe_override)
> > > > +		add_taint(TAINT_FAULT_INJECTED, LOCKDEP_NOW_UNRELIABLE);
> > > 
> > > Nack for bpf bits.
> > 
> > I think this is needed especially for bpf bits. If we see this flag,
> > we can ask reporters to share the bpf programs which they used.
> 
> You can ask reporters to share bpf progs, but you can repro
> the oops just as well without bpf. It's not bpf to blame, but the
> bug in the caller that you should worry about.

I don't blame the bpf, but just it points that undesigned behavior has
been injected. So we have to take it into account.

Thank you,
  
Alexei Starovoitov Dec. 7, 2022, 4:01 a.m. UTC | #5
On Tue, Dec 06, 2022 at 04:20:35PM +0900, Masami Hiramatsu wrote:
> On Mon, 5 Dec 2022 18:17:00 -0800
> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> 
> > On Mon, Dec 05, 2022 at 07:59:21AM +0900, Masami Hiramatsu wrote:
> > > On Sun, 4 Dec 2022 14:30:01 -0800
> > > Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> > > 
> > > > On Mon, Dec 05, 2022 at 07:22:44AM +0900, Masami Hiramatsu (Google) wrote:
> > > > > From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> > > > > 
> > > > > Since the function error injection framework in the fault injection
> > > > > subsystem can change the function code flow forcibly, it may cause
> > > > > unexpected behavior (and that is the purpose of this feature) even
> > > > > if it is applied to the ALLOW_ERROR_INJECTION functions.
> > > > > So this feature must be used only for debugging or testing purpose.
> > > > 
> > > > The whole idea of tainting for kernel debugging is questionable.
> > > > There are many other *inject* kconfigs and other debug flags
> > > > for link lists, RCU, sleeping, etc.
> > > > None of them taint the kernel.
> > > > 
> > > > > To identify this in the kernel oops message, add a new taint flag
> > > > 
> > > > Have you ever seen a single oops message because of this particular
> > > > error injection?
> > > 
> > > No, but there is no guarantee that the FEI doesn't cause any issue
> > > in the future too. If it happens, we need to know the precise
> > > information about what FEI/bpf does.
> > > FEI is a kind of temporal Livepatch for testing. If Livepatch taints
> > > the kernel, why doesn't the FEI taint it too?
> > 
> > Live patching can replace an arbitrary function and the kernel has
> > no visibility into what KLP module is doing.
> > While 'bpf error injection' is predictable.
> 
> No, not much predictable because the kernel code can be changed.
> 
> > The functions marked with [BPF_]ALLOW_ERROR_INJECTION can return errors
> > in the normal execution. So the callers of these functions have to deal with errors.
> 
> Right, but it might change something before checking the input, and
> if it rejects the sane input, the caller may go into unexpected
> status (e.g. the caller already checked input value, and does not
> expect the call is fail). Such behaviors are buggy, yes. And the
> FEI is designed for finding such buggy behavior.
> (e.g. injecting error, but the caller passed successfully, it
> means the caller code has some issue.)
> 
> > If kernel panics on such injected error it potentially would have paniced
> > on it anyway.
> 
> Yes, but that doesn't cover all cases. If the function doesn't have
> any internal state but returns an error according to the input,
> FEI can make it return an error even if the input is correct.
> And if it cause a kernel panic, that is a panic that must not
> happen without FEI.
> 
> Thus, the ALLOW_ERROR_INJECTION should only be applied to the
> function which has so-called 'side-effect', e.g. memory allocation,
> external data (except for input data) read, etc. that could cause
> an error regardless of the input value. Then the caller must
> handle such errors.

Not quite. I think you're confusing functions with 'side effect'
with 'pure' functions.
Your point about 'checking args before the call' applies to pure functions.
We have some too: git grep __pure.
Most of the time the compiler can identify the functions that return
the same value with the same args, but sometimes it needs help
and we mark such functions.
Clearly such functions should never be marked as 'error inject'
because changing return value of such function might lead to
wrong code and _it will not be a kernel bug_.
The compiler optimizations rely on the function being pure.
Live kernel patching should be very careful with __pure functions too.
No idea whether they do this now or not.

Of course, there is a category of functions (with or without side effects)
which return values should not be changed by error injecton mechanism.
That's a given. Applying ALLOW_ERROR_INJECTION should not to be taken lightly.

> 
> > At this point crash dump might be necessary to debug.
> 
> Yes. So the TAINT flag can help. Please consider that the TAINT flag
> doesn't mean you are guilty, but this is just a hint for debugging.
> (good for the first triage)

I think you misunderstand the reason behind 'tainted' flags.
It's 'hint for debugging' only on the surface.
See Documentation/admin-guide/tainted-kernels.rst
"... That's why bug reports
from tainted kernels will often be ignored by developers, hence try to reproduce
problems with an untainted kernel."

When 'error injection' finds a kernel bug the kernel developers need to
look into it regardless whether it's syzbot error injection
or whatever other mechanism.

To change the topic to something ... else...

We've just hit this panic using rethook.
[   49.235708] ==================================================================
[   49.236243] BUG: KASAN: use-after-free in rethook_try_get+0x7e/0x380
[   49.236693] Read of size 8 at addr ffff888102e62c88 by task test_progs/1688
[   49.240398]  kasan_report+0x90/0x190
[   49.240934]  rethook_try_get+0x7e/0x380
[   49.244885]  fprobe_handler.part.1+0x119/0x1f0
[   49.245505]  arch_ftrace_ops_list_func+0x17d/0x1d0
[   49.246544]  ftrace_regs_call+0x5/0x52
[   49.247411]  bpf_fentry_test1+0x5/0x10

[   49.262578] Allocated by task 1692:
[   49.262804]  kasan_save_stack+0x1c/0x40
[   49.263059]  kasan_set_track+0x21/0x30
[   49.263335]  __kasan_kmalloc+0x7a/0x90
[   49.263624]  rethook_alloc+0x2c/0xa0
[   49.263879]  fprobe_init_rethook+0x6d/0x170
[   49.264154]  register_fprobe_ips+0xae/0x130

[   49.265938] Freed by task 0:
[   49.266153]  kasan_save_stack+0x1c/0x40
[   49.266440]  kasan_set_track+0x21/0x30
[   49.266705]  kasan_save_free_info+0x26/0x40
[   49.266995]  __kasan_slab_free+0x103/0x190
[   49.267282]  __kmem_cache_free+0x1b7/0x3a0
[   49.267559]  rcu_core+0x4d8/0xd50

[   49.268181] Last potentially related work creation:
[   49.268565]  kasan_save_stack+0x1c/0x40
[   49.268898]  __kasan_record_aux_stack+0xa1/0xb0
[   49.269260]  call_rcu+0x47/0x360
[   49.269526]  unregister_fprobe+0x47/0x80

[   49.281382] general protection fault, probably for non-canonical address 0x57e006e00000000: 0000 [#1] PREEMPT SMP KASAN
[   49.282226] CPU: 6 PID: 1688 Comm: test_progs Tainted: G    B      O       6.1.0-rc7-01508-gf0c5a2d9f234 #4343
[   49.283751] RIP: 0010:rethook_trampoline_handler+0xff/0x1d0
[   49.289900] Call Trace:
[   49.290083]  <TASK>
[   49.290248]  arch_rethook_trampoline_callback+0x6c/0xa0
[   49.290631]  arch_rethook_trampoline+0x2c/0x50
[   49.290964]  ? lock_release+0xad/0x3f0
[   49.291245]  ? bpf_prog_test_run_tracing+0x235/0x380
[   49.291609]  trace_clock_x86_tsc+0x10/0x10

This is just running bpf selftests in parallel mode on 16-cpu VM on bpf-next.
Notice 'Tained' flags.
Please take a look.

Thanks!
  
Steven Rostedt Dec. 7, 2022, 4:39 a.m. UTC | #6
On Tue, 6 Dec 2022 20:01:46 -0800
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:


> >   
> > > At this point crash dump might be necessary to debug.  
> > 
> > Yes. So the TAINT flag can help. Please consider that the TAINT flag
> > doesn't mean you are guilty, but this is just a hint for debugging.
> > (good for the first triage)  
> 
> I think you misunderstand the reason behind 'tainted' flags.
> It's 'hint for debugging' only on the surface.
> See Documentation/admin-guide/tainted-kernels.rst
> "... That's why bug reports
> from tainted kernels will often be ignored by developers, hence try to reproduce
> problems with an untainted kernel."

You conveniently left out the first part of that paragraph. Showing just a
portion of a statement can be very misleading. Let me add the whole
paragraph here:

   The kernel will mark itself as 'tainted' when something occurs that
   might be relevant later when investigating problems. Don't worry too much about this,
   most of the time it's not a problem to run a tainted kernel; the information is
   mainly of interest once someone wants to investigate some problem, as its real
   cause might be the event that got the kernel tainted. That's why bug reports
   from tainted kernels will often be ignored by developers, hence try to reproduce
   problems with an untainted kernel.

Let me stress the very first sentence above:

   The kernel will mark itself as 'tainted' when something occurs that
   might be relevant later when investigating problems.

I think you are the one that is misunderstanding what a taint is. It most
definitely is about giving hints for debugging. That's why the very first
sentence of that paragraph, as well as the entire document, explicitly
states "might be relevant later when investigating problems".


> 
> When 'error injection' finds a kernel bug the kernel developers need to
> look into it regardless whether it's syzbot error injection
> or whatever other mechanism.
> 

And this is a very useful taint. Just like:

  2  _/S       4  kernel running on an out of specification system
  5  _/B      32  bad page referenced or some unexpected page flags
  7  _/D     128  kernel died recently, i.e. there was an OOPS or BUG
 10  _/C    1024  staging driver was loaded
 11  _/I    2048  workaround for bug in platform firmware applied
 14  _/L   16384  soft lockup occurred 
 17  _/T  131072  kernel was built with the struct randomization plugin

Any of the above should not be ignored by developers, but they are useful
hints for debugging the issue.


> To change the topic to something ... else...
> 
> We've just hit this panic using rethook.
> [   49.235708] ==================================================================
> [   49.236243] BUG: KASAN: use-after-free in rethook_try_get+0x7e/0x380
> [   49.236693] Read of size 8 at addr ffff888102e62c88 by task test_progs/1688
> [   49.240398]  kasan_report+0x90/0x190
> [   49.240934]  rethook_try_get+0x7e/0x380
> [   49.244885]  fprobe_handler.part.1+0x119/0x1f0
> [   49.245505]  arch_ftrace_ops_list_func+0x17d/0x1d0
> [   49.246544]  ftrace_regs_call+0x5/0x52
> [   49.247411]  bpf_fentry_test1+0x5/0x10
> 
> [   49.262578] Allocated by task 1692:
> [   49.262804]  kasan_save_stack+0x1c/0x40
> [   49.263059]  kasan_set_track+0x21/0x30
> [   49.263335]  __kasan_kmalloc+0x7a/0x90
> [   49.263624]  rethook_alloc+0x2c/0xa0
> [   49.263879]  fprobe_init_rethook+0x6d/0x170
> [   49.264154]  register_fprobe_ips+0xae/0x130
> 
> [   49.265938] Freed by task 0:
> [   49.266153]  kasan_save_stack+0x1c/0x40
> [   49.266440]  kasan_set_track+0x21/0x30
> [   49.266705]  kasan_save_free_info+0x26/0x40
> [   49.266995]  __kasan_slab_free+0x103/0x190
> [   49.267282]  __kmem_cache_free+0x1b7/0x3a0
> [   49.267559]  rcu_core+0x4d8/0xd50
> 
> [   49.268181] Last potentially related work creation:
> [   49.268565]  kasan_save_stack+0x1c/0x40
> [   49.268898]  __kasan_record_aux_stack+0xa1/0xb0
> [   49.269260]  call_rcu+0x47/0x360
> [   49.269526]  unregister_fprobe+0x47/0x80
> 
> [   49.281382] general protection fault, probably for non-canonical address 0x57e006e00000000: 0000 [#1] PREEMPT SMP KASAN
> [   49.282226] CPU: 6 PID: 1688 Comm: test_progs Tainted: G    B      O       6.1.0-rc7-01508-gf0c5a2d9f234 #4343
> [   49.283751] RIP: 0010:rethook_trampoline_handler+0xff/0x1d0
> [   49.289900] Call Trace:
> [   49.290083]  <TASK>
> [   49.290248]  arch_rethook_trampoline_callback+0x6c/0xa0
> [   49.290631]  arch_rethook_trampoline+0x2c/0x50
> [   49.290964]  ? lock_release+0xad/0x3f0
> [   49.291245]  ? bpf_prog_test_run_tracing+0x235/0x380
> [   49.291609]  trace_clock_x86_tsc+0x10/0x10
> 
> This is just running bpf selftests in parallel mode on 16-cpu VM on bpf-next.
> Notice 'Tained' flags.
> Please take a look.
> 

"G - Proprietary module" - "O - out of tree module"

Can you reproduce this without those taints?

-- Steve
  
Steven Rostedt Dec. 7, 2022, 4:41 a.m. UTC | #7
On Tue, 6 Dec 2022 23:39:47 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> "G - Proprietary module" - "O - out of tree module"

Sorry, "P" is proprietary, "G" is still GPL. But it is an out of tree
module still.  ;-)

-- Steve
  
Alexei Starovoitov Dec. 7, 2022, 4:45 a.m. UTC | #8
On Tue, Dec 6, 2022 at 8:39 PM Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > [   49.281382] general protection fault, probably for non-canonical address 0x57e006e00000000: 0000 [#1] PREEMPT SMP KASAN
> > [   49.282226] CPU: 6 PID: 1688 Comm: test_progs Tainted: G    B      O       6.1.0-rc7-01508-gf0c5a2d9f234 #4343
> > [   49.283751] RIP: 0010:rethook_trampoline_handler+0xff/0x1d0
> > [   49.289900] Call Trace:
> > [   49.290083]  <TASK>
> > [   49.290248]  arch_rethook_trampoline_callback+0x6c/0xa0
> > [   49.290631]  arch_rethook_trampoline+0x2c/0x50
> > [   49.290964]  ? lock_release+0xad/0x3f0
> > [   49.291245]  ? bpf_prog_test_run_tracing+0x235/0x380
> > [   49.291609]  trace_clock_x86_tsc+0x10/0x10
> >
> > This is just running bpf selftests in parallel mode on 16-cpu VM on bpf-next.
> > Notice 'Tained' flags.
> > Please take a look.
> >
>
> "G - Proprietary module" - "O - out of tree module"
>
> Can you reproduce this without those taints?

Lol. That question is exactly the reason why my Nack stands.
  
Steven Rostedt Dec. 7, 2022, 5:18 a.m. UTC | #9
On December 6, 2022 11:45:17 PM EST, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>On Tue, Dec 6, 2022 at 8:39 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>> >
>> > [   49.281382] general protection fault, probably for non-canonical address 0x57e006e00000000: 0000 [#1] PREEMPT SMP KASAN
>> > [   49.282226] CPU: 6 PID: 1688 Comm: test_progs Tainted: G    B      O       6.1.0-rc7-01508-gf0c5a2d9f234 #4343
>> > [   49.283751] RIP: 0010:rethook_trampoline_handler+0xff/0x1d0
>> > [   49.289900] Call Trace:
>> > [   49.290083]  <TASK>
>> > [   49.290248]  arch_rethook_trampoline_callback+0x6c/0xa0
>> > [   49.290631]  arch_rethook_trampoline+0x2c/0x50
>> > [   49.290964]  ? lock_release+0xad/0x3f0
>> > [   49.291245]  ? bpf_prog_test_run_tracing+0x235/0x380
>> > [   49.291609]  trace_clock_x86_tsc+0x10/0x10
>> >
>> > This is just running bpf selftests in parallel mode on 16-cpu VM on bpf-next.
>> > Notice 'Tained' flags.
>> > Please take a look.
>> >
>>
>> "G - Proprietary module" - "O - out of tree module"
>>
>> Can you reproduce this without those taints?
>
>Lol. That question is exactly the reason why my Nack stands.

I only said the above *because* of your comment ;-) 

-- Steve
  
Steven Rostedt Dec. 7, 2022, 12:48 p.m. UTC | #10
On Tue, 6 Dec 2022 20:45:17 -0800
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> > "G - Proprietary module" - "O - out of tree module"
> >
> > Can you reproduce this without those taints?  
> 
> Lol. That question is exactly the reason why my Nack stands.

First, that's a BS reason for a NACK.

But in all seriousness, what I would actually ask (and what I'll ask now)
is, what module did you use that is out of tree, and was it relevant to
this test?

That's a reasonable question to ask, and one that only gets asked with a
taint.


If there's a BPF injection taint, one can ask that same question, as the
bug may happen sometime after the injection but be caused by that injection,
and not be in the backtrace. Not all kernel developers have the access to
debugging utilities that backend production servers have. A lot of bugs that
kernel developers debug are from someone's laptop. Where all they have is
that backtrace. If a tool or root kit, added function error injection, it
would be extremely useful information to debug what happened.

I don't understand the push back here.

-- Steve
  
Alexei Starovoitov Dec. 8, 2022, 4:36 a.m. UTC | #11
On Wed, Dec 07, 2022 at 07:48:06AM -0500, Steven Rostedt wrote:
> On Tue, 6 Dec 2022 20:45:17 -0800
> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> 
> > > "G - Proprietary module" - "O - out of tree module"
> > >
> > > Can you reproduce this without those taints?  
> > 
> > Lol. That question is exactly the reason why my Nack stands.
> 
> First, that's a BS reason for a NACK.
> 
> But in all seriousness, what I would actually ask (and what I'll ask now)
> is, what module did you use that is out of tree, and was it relevant to
> this test?
> 
> That's a reasonable question to ask, and one that only gets asked with a
> taint.

When we receive bug reports in bpf@vger the only question we ask is:
"How to reproduce this bug?"
We ignoring taint completely.
tbh I didn't even know that our BPF CI causes taint until that email.

In the previous email I said that the bug report comes from bpf selftest on bpf-next.
Clearly there is no ambiguity that this is as upstream as it can get.
Yet for 2 days this 'taint' arguing is preventing people from looking at the bug.
And that happens all the time on lkml. Somebody reports a bug and kernel devs
jump on the poor person:
"Can you repro without taint?",
"Can you repro with upstream kernel?"
This is discouraging.
The 'taint' concept makes it easier for kernel devs to ignore bug reports
and push back on the reporter. Do it few times and people stop reporting bugs.
Say, this particular bug in rethook was found by one of our BPF CI developers.
They're not very familiar with the kernel, but they can see plenty of 'rethook'
references in the stack trace, lookup MAINTAINER file and ping Massami,
but to the question "can you repro without taint?" they can only say NO,
because this is how our CI works. So they will keep silence and the bug will be lost.
That's not the only reason why I'm against generalizing 'taint'.
Tainting because HW is misbehaving makes sense, but tainting because
of OoO module or because of live-patching does not.
It becomes an excuse that people abuse.

Right now syzbot is finding all sorts of bugs. Most of the time syzbot
turns error injection on to find those allocation issues.
If syzbot reports will start coming as tainted there will be even less
attention to them. That will not be good.

> If there's a BPF injection taint, one can ask that same question, as the
> bug may happen sometime after the injection but be caused by that injection,
> and not be in the backtrace. Not all kernel developers have the access to
> debugging utilities that backend production servers have. A lot of bugs that
> kernel developers debug are from someone's laptop.

I would have to disagree.
We see few human reported bugs and prioritize them higher,
but majority are coming from the production fleet, test tiers,
syzbot, CIs, and automated things.

> Where all they have is
> that backtrace. 

Even laptops typically leave vmcore after crash. distro support packages have
clear rules what scripts to run to collect all the necessary data in case
of the crash from vmcore.
These tools support extracting everything bpf related too.
For example see:
Documentation/bpf/drgn.rst
It works on vmcore and on the running kernel.

> If a tool or root kit, added function error injection, it
> would be extremely useful information to debug what happened.
> 
> I don't understand the push back here.

All these years we've been working on improving bpf introspection and
debuggability. Today crash dumps look like this:
  bpf_trace_printk+0xd3/0x170 kernel/trace/bpf_trace.c:377
  bpf_prog_cf2ac6d483d8499b_trace_bpf_trace_printk+0x2b/0x37
  bpf_dispatcher_nop_func include/linux/bpf.h:1082 [inline]
  __bpf_prog_run include/linux/filter.h:600 [inline]
  bpf_prog_run include/linux/filter.h:607 [inline]

The 2nd from the top is a bpf prog. The rest are kernel functions.
bpf_prog_cf2ac6d483d8499b_trace_bpf_trace_printk
         ^^ is a prog tag   ^^ name of bpf prog

If you do 'bpftool prog show' you can see both tag and name. 
'bpftool prog dump jited'
dumps x86 code mixed with source line text.
Often enough +0x2b offset will have some C code right next to it.

One can monitor all prog load/unload via perf or via audit.
'perf record' collects all bpf progs that were loaded before
the start and even thouse that were loaded and unloaded quickly
while 'perf record' was running.
So 'perf report' has all the data including annotation and source code.

Currently we're working on adding 'file.c:lineno' to dump_stack()
for bpf progs.

If you have ideas how we can improve debuggability we are all ears.
  
Steven Rostedt Dec. 8, 2022, 2:59 p.m. UTC | #12
On Wed, 7 Dec 2022 20:36:28 -0800
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> > But in all seriousness, what I would actually ask (and what I'll ask now)
> > is, what module did you use that is out of tree, and was it relevant to
> > this test?
> > 
> > That's a reasonable question to ask, and one that only gets asked with a
> > taint.  
> 
> When we receive bug reports in bpf@vger the only question we ask is:
> "How to reproduce this bug?"
> We ignoring taint completely.

Um, I'm guessing that bug reports that go to bpf@vger are focused on issues
with BPF and not core kernel infrastructure. No, I do not consider BPF (nor
tracing for that matter) as core infrastructure. The bug reports I'm
talking about is for the scheduler, exception handling, interrupts, VFS,
MM, and networking. When one of these have a bug report, you most
definitely do look at taints. There's been several times where a bug report
was caused by an out of tree module, or a machine check exception.

> tbh I didn't even know that our BPF CI causes taint until that email.

That's because BPF development probably never gets hit by things that cause
taints. That doesn't go for the rest of the kernel.

> 
> In the previous email I said that the bug report comes from bpf selftest on bpf-next.
> Clearly there is no ambiguity that this is as upstream as it can get.
> Yet for 2 days this 'taint' arguing is preventing people from looking at the bug.

Who isn't looking at the bug because of the 'taint' argument? I'm not
looking at it right now, because it isn't top of my priority list.


> And that happens all the time on lkml. Somebody reports a bug and kernel devs
> jump on the poor person:
> "Can you repro without taint?",
> "Can you repro with upstream kernel?"
> This is discouraging.

So is spending 5 days debugging something and then finding out that what
caused the taint *was* the culprit!

Sorry, but that has happened to me too many times. Which is why you get
grumpy kernel developers pushing back on this.

> The 'taint' concept makes it easier for kernel devs to ignore bug reports
> and push back on the reporter. Do it few times and people stop reporting bugs.
> Say, this particular bug in rethook was found by one of our BPF CI developers.
> They're not very familiar with the kernel, but they can see plenty of 'rethook'
> references in the stack trace, lookup MAINTAINER file and ping Massami,
> but to the question "can you repro without taint?" they can only say NO,
> because this is how our CI works. So they will keep silence and the bug will be lost.
> That's not the only reason why I'm against generalizing 'taint'.
> Tainting because HW is misbehaving makes sense, but tainting because
> of OoO module or because of live-patching does not.
> It becomes an excuse that people abuse.

Again, I've been burned by OoO modules more than once. If you send 5 days
debugging something to find out that the cause was from an OoO module,
you'd change your tune too.

> 
> Right now syzbot is finding all sorts of bugs. Most of the time syzbot
> turns error injection on to find those allocation issues.
> If syzbot reports will start coming as tainted there will be even less
> attention to them. That will not be good.
> 
> > If there's a BPF injection taint, one can ask that same question, as the
> > bug may happen sometime after the injection but be caused by that injection,
> > and not be in the backtrace. Not all kernel developers have the access to
> > debugging utilities that backend production servers have. A lot of bugs that
> > kernel developers debug are from someone's laptop.  
> 
> I would have to disagree.
> We see few human reported bugs and prioritize them higher,
> but majority are coming from the production fleet, test tiers,
> syzbot, CIs, and automated things.
> 
> > Where all they have is
> > that backtrace.   
> 
> Even laptops typically leave vmcore after crash. distro support packages have
> clear rules what scripts to run to collect all the necessary data in case
> of the crash from vmcore.

They are not as common as you think.

> These tools support extracting everything bpf related too.
> For example see:
> Documentation/bpf/drgn.rst
> It works on vmcore and on the running kernel.
> 
> > If a tool or root kit, added function error injection, it
> > would be extremely useful information to debug what happened.
> > 
> > I don't understand the push back here.  
> 
> All these years we've been working on improving bpf introspection and
> debuggability. Today crash dumps look like this:
>   bpf_trace_printk+0xd3/0x170 kernel/trace/bpf_trace.c:377
>   bpf_prog_cf2ac6d483d8499b_trace_bpf_trace_printk+0x2b/0x37
>   bpf_dispatcher_nop_func include/linux/bpf.h:1082 [inline]
>   __bpf_prog_run include/linux/filter.h:600 [inline]
>   bpf_prog_run include/linux/filter.h:607 [inline]
> 
> The 2nd from the top is a bpf prog. The rest are kernel functions.
> bpf_prog_cf2ac6d483d8499b_trace_bpf_trace_printk
>          ^^ is a prog tag   ^^ name of bpf prog
> 
> If you do 'bpftool prog show' you can see both tag and name. 
> 'bpftool prog dump jited'
> dumps x86 code mixed with source line text.
> Often enough +0x2b offset will have some C code right next to it.
> 
> One can monitor all prog load/unload via perf or via audit.
> 'perf record' collects all bpf progs that were loaded before
> the start and even thouse that were loaded and unloaded quickly
> while 'perf record' was running.
> So 'perf report' has all the data including annotation and source code.
> 
> Currently we're working on adding 'file.c:lineno' to dump_stack()
> for bpf progs.
> 
> If you have ideas how we can improve debuggability we are all ears.

I talked with KP on IRC, and he said he's going to work on the same thing
as above. That is, showing what BPF programs are loaded, and if they modify
the return of any function, as well as what those functions are. And have
that reported in the oops message.

I'm not hell bent on BPF triggering a taint, as long as the oops message
explicitly defines what BPF is doing. I think that's much more informative
than a taint, and I would greatly welcome that. The main issue I'm raising
is that I want the oops message to have most the information to gather what
went wrong, and not rely on other tooling like kernel core dumps and drgn
for debugging the problem.

-- Steve
  
Masami Hiramatsu (Google) Dec. 11, 2022, 2:52 a.m. UTC | #13
Hi Alexei,

On Wed, 7 Dec 2022 20:36:28 -0800
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> Yet for 2 days this 'taint' arguing is preventing people from looking at the bug.
> And that happens all the time on lkml. Somebody reports a bug and kernel devs
> jump on the poor person:
> "Can you repro without taint?",
> "Can you repro with upstream kernel?"
> This is discouraging.
> The 'taint' concept makes it easier for kernel devs to ignore bug reports
> and push back on the reporter.
> Do it few times and people stop reporting bugs.

That seems off topic for me. You seems complained against the taint flag
itself.

> Say, this particular bug in rethook was found by one of our BPF CI developers.
> They're not very familiar with the kernel, but they can see plenty of 'rethook'
> references in the stack trace, lookup MAINTAINER file and ping Massami,
> but to the question "can you repro without taint?" they can only say NO,
> because this is how our CI works. So they will keep silence and the bug will be lost.

BTW, this sounds like the BPF CI system design issue. If user is NOT easily
identifying what test caused the issue (e.g. what tests ran on the system
until the bug was found), the CI system is totally useless, because after
finding a problem, it must be investigated to solve the problem.

Without investigation, how would you usually fix the bug??

> That's not the only reason why I'm against generalizing 'taint'.
> Tainting because HW is misbehaving makes sense, but tainting because
> of OoO module or because of live-patching does not.
> It becomes an excuse that people abuse.

yeah, it is possible to be abused. but that is the problem who
abuse it.

> Right now syzbot is finding all sorts of bugs. Most of the time syzbot
> turns error injection on to find those allocation issues.
> If syzbot reports will start coming as tainted there will be even less
> attention to them. That will not be good.

Hmm, what kind of error injection does syzbot do? I would like to know
how it is used. For example, does that use only a specify set of
injection points, or use all existing points?

If the latter, I feel safer because syzbot ensures the current all
ALLOW_ERROR_INJECTION() functions will work with error injection. If not,
we need to consider removing the ALLOW_ERROR_INJECTION() from the
function which is not tested well (or add this taint flag.)

Documentation/fault-injection/fault-injection.rst has no explanation
about ALLOW_ERROR_INJECTION(), but obviously the ALLOW_ERROR_INJECTION()
marked functions and its caller MUST be designed safely against the
error injection. e.g.

- It must return an error code. (so EI_ETYPE_NONE must be removed)
- Caller must check the return value always.
  (but I thought this was the reason why we need this test framework...)
- It should not run any 'effective' code before checking an error.
  For example, increment counter, call other functions etc.
  (this means it can return without any side-effect)

Anything else?

[...]
> All these years we've been working on improving bpf introspection and
> debuggability. Today crash dumps look like this:
>   bpf_trace_printk+0xd3/0x170 kernel/trace/bpf_trace.c:377
>   bpf_prog_cf2ac6d483d8499b_trace_bpf_trace_printk+0x2b/0x37
>   bpf_dispatcher_nop_func include/linux/bpf.h:1082 [inline]
>   __bpf_prog_run include/linux/filter.h:600 [inline]
>   bpf_prog_run include/linux/filter.h:607 [inline]
> 
> The 2nd from the top is a bpf prog. The rest are kernel functions.
> bpf_prog_cf2ac6d483d8499b_trace_bpf_trace_printk
>          ^^ is a prog tag   ^^ name of bpf prog
> 
> If you do 'bpftool prog show' you can see both tag and name. 
> 'bpftool prog dump jited'
> dumps x86 code mixed with source line text.
> Often enough +0x2b offset will have some C code right next to it.

This is good, but this only works when the vmcore is dumped and
on the stack. My concern about the function error injection is
that makes some side effects, which can cause a problem afterwards
(this means after unloading the bpf prog)

> 
> One can monitor all prog load/unload via perf or via audit.

Ah, audit is helpful :), because we can dig the log what was loaded
before crash.


Thank you,
  
KP Singh Dec. 11, 2022, 7:49 a.m. UTC | #14
On Sun, Dec 11, 2022 at 3:52 AM Masami Hiramatsu <mhiramat@kernel.org> wrote:
>
> Hi Alexei,
>
> On Wed, 7 Dec 2022 20:36:28 -0800
> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>
> > Yet for 2 days this 'taint' arguing is preventing people from looking at the bug.
> > And that happens all the time on lkml. Somebody reports a bug and kernel devs
> > jump on the poor person:
> > "Can you repro without taint?",
> > "Can you repro with upstream kernel?"
> > This is discouraging.
> > The 'taint' concept makes it easier for kernel devs to ignore bug reports
> > and push back on the reporter.
> > Do it few times and people stop reporting bugs.
>
> That seems off topic for me. You seems complained against the taint flag
> itself.

The series is about adding a taint for, so discussing the user
experience, when someone reports a "tainted crash" seems reasonable to
me and not off topic.

>
> > Say, this particular bug in rethook was found by one of our BPF CI developers.
> > They're not very familiar with the kernel, but they can see plenty of 'rethook'
> > references in the stack trace, lookup MAINTAINER file and ping Massami,
> > but to the question "can you repro without taint?" they can only say NO,
> > because this is how our CI works. So they will keep silence and the bug will be lost.
>
> BTW, this sounds like the BPF CI system design issue. If user is NOT easily
> identifying what test caused the issue (e.g. what tests ran on the system
> until the bug was found), the CI system is totally useless, because after
> finding a problem, it must be investigated to solve the problem.
>
> Without investigation, how would you usually fix the bug??

Masami, this seems accusational and counter productive, it was never
said that issues can be solved without investigation.

The BPF CI does find issues, the BPF reviewers and maintainers
regularly fix bugs using it. Alexei's point here is that a taint does
not help in solving the problem, rather deter some people from even
looking at it. (not BPF people, but other maintainers [distro, kernel]
who would ask for a reproduction without a taint).

Let's take a step back and focus on solving debuggability and
introspection as we clearly have some perception issues about taints
in the community. (distro maintainers, users) before we go and add
more taints.

>
> > That's not the only reason why I'm against generalizing 'taint'.
> > Tainting because HW is misbehaving makes sense, but tainting because
> > of OoO module or because of live-patching does not.
> > It becomes an excuse that people abuse.
>
> yeah, it is possible to be abused. but that is the problem who
> abuse it.

I am sorry, but it's our responsibility as developers to design
features so that users don't face arduous pushbacks.

> > Right now syzbot is finding all sorts of bugs. Most of the time syzbot
> > turns error injection on to find those allocation issues.
> > If syzbot reports will start coming as tainted there will be even less
> > attention to them. That will not be good.
>
> Hmm, what kind of error injection does syzbot do? I would like to know
> how it is used. For example, does that use only a specify set of
> injection points, or use all existing points?
>
> If the latter, I feel safer because syzbot ensures the current all
> ALLOW_ERROR_INJECTION() functions will work with error injection. If not,
> we need to consider removing the ALLOW_ERROR_INJECTION() from the
> function which is not tested well (or add this taint flag.)
>
> Documentation/fault-injection/fault-injection.rst has no explanation
> about ALLOW_ERROR_INJECTION(), but obviously the ALLOW_ERROR_INJECTION()
> marked functions and its caller MUST be designed safely against the
> error injection. e.g.
>
> - It must return an error code. (so EI_ETYPE_NONE must be removed)

This is already the case with BPF, the modify return trampolines
further limits the error injection to functions that return errors.

> - Caller must check the return value always.
>   (but I thought this was the reason why we need this test framework...)
> - It should not run any 'effective' code before checking an error.
>   For example, increment counter, call other functions etc.
>   (this means it can return without any side-effect)

This is the case with modify_return trampolines in BPF which avoid
side effects and limit the attachment surface further and avoiding
side effects is a design goal. If we missed anything, let's fix that.

https://lwn.net/Articles/813724/https://lwn.net/Articles/813724/

>
> Anything else?
>
> [...]
> > All these years we've been working on improving bpf introspection and
> > debuggability. Today crash dumps look like this:
> >   bpf_trace_printk+0xd3/0x170 kernel/trace/bpf_trace.c:377
> >   bpf_prog_cf2ac6d483d8499b_trace_bpf_trace_printk+0x2b/0x37
> >   bpf_dispatcher_nop_func include/linux/bpf.h:1082 [inline]
> >   __bpf_prog_run include/linux/filter.h:600 [inline]
> >   bpf_prog_run include/linux/filter.h:607 [inline]
> >
> > The 2nd from the top is a bpf prog. The rest are kernel functions.
> > bpf_prog_cf2ac6d483d8499b_trace_bpf_trace_printk
> >          ^^ is a prog tag   ^^ name of bpf prog
> >
> > If you do 'bpftool prog show' you can see both tag and name.
> > 'bpftool prog dump jited'
> > dumps x86 code mixed with source line text.
> > Often enough +0x2b offset will have some C code right next to it.
>
> This is good, but this only works when the vmcore is dumped and
> on the stack. My concern about the function error injection is
> that makes some side effects, which can cause a problem afterwards
> (this means after unloading the bpf prog)

I think careful choices need to be made on when error injection is
allowed so that these situations don't occur. (as you mentioned in
your comment). [1]. If a BPF program is unloaded, there is no error
injection any more, let's ensure that we design the error injection
allow list and the BPF logic to ensure this cannot happen.


>
> >
> > One can monitor all prog load/unload via perf or via audit.

I would like us to focus on debuggability as it helps both the
maintainers and the user. And I see a few things that need to be done:

1. Revisit what is allowed for error injection in the kernel and if
they can cause any subtle issues. My initial take is that functions
that are directly called from syscall path should generally be okay.
But let's check them for the patterns you mentioned.
2. If it helps, add the list of BPF modify return programs to stack
traces. Although this is really needed if we don't do [1] properly.
3. Check if anything needs to be improved in the verification logic
for modify return trampolines.

- KP

>
> Ah, audit is helpful :), because we can dig the log what was loaded
> before crash.
>
>
> Thank you,
>
> --
> Masami Hiramatsu (Google) <mhiramat@kernel.org>
  
Masami Hiramatsu (Google) Dec. 11, 2022, 3:14 p.m. UTC | #15
On Sun, 11 Dec 2022 08:49:01 +0100
KP Singh <kpsingh@kernel.org> wrote:

> On Sun, Dec 11, 2022 at 3:52 AM Masami Hiramatsu <mhiramat@kernel.org> wrote:
> >
> > Hi Alexei,
> >
> > On Wed, 7 Dec 2022 20:36:28 -0800
> > Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> >
> > > Yet for 2 days this 'taint' arguing is preventing people from looking at the bug.
> > > And that happens all the time on lkml. Somebody reports a bug and kernel devs
> > > jump on the poor person:
> > > "Can you repro without taint?",
> > > "Can you repro with upstream kernel?"
> > > This is discouraging.
> > > The 'taint' concept makes it easier for kernel devs to ignore bug reports
> > > and push back on the reporter.
> > > Do it few times and people stop reporting bugs.
> >
> > That seems off topic for me. You seems complained against the taint flag
> > itself.
> 
> The series is about adding a taint for, so discussing the user
> experience, when someone reports a "tainted crash" seems reasonable to
> me and not off topic.
> 
> >
> > > Say, this particular bug in rethook was found by one of our BPF CI developers.
> > > They're not very familiar with the kernel, but they can see plenty of 'rethook'
> > > references in the stack trace, lookup MAINTAINER file and ping Massami,
> > > but to the question "can you repro without taint?" they can only say NO,
> > > because this is how our CI works. So they will keep silence and the bug will be lost.
> >
> > BTW, this sounds like the BPF CI system design issue. If user is NOT easily
> > identifying what test caused the issue (e.g. what tests ran on the system
> > until the bug was found), the CI system is totally useless, because after
> > finding a problem, it must be investigated to solve the problem.
> >
> > Without investigation, how would you usually fix the bug??
> 
> Masami, this seems accusational and counter productive, it was never
> said that issues can be solved without investigation.

Let me apologies about my misunderstanding.  

> 
> The BPF CI does find issues, the BPF reviewers and maintainers
> regularly fix bugs using it. Alexei's point here is that a taint does
> not help in solving the problem, rather deter some people from even
> looking at it. (not BPF people, but other maintainers [distro, kernel]
> who would ask for a reproduction without a taint).

Hmm, that is a problem. Some taint flag should be useful hints
for finding the error patterns.

> Let's take a step back and focus on solving debuggability and
> introspection as we clearly have some perception issues about taints
> in the community. (distro maintainers, users) before we go and add
> more taints.

Agreed.

> > > That's not the only reason why I'm against generalizing 'taint'.
> > > Tainting because HW is misbehaving makes sense, but tainting because
> > > of OoO module or because of live-patching does not.
> > > It becomes an excuse that people abuse.
> >
> > yeah, it is possible to be abused. but that is the problem who
> > abuse it.
> 
> I am sorry, but it's our responsibility as developers to design
> features so that users don't face arduous pushbacks.

Sorry if I confuse you. I meant that taint flag abusing. :(

> 
> > > Right now syzbot is finding all sorts of bugs. Most of the time syzbot
> > > turns error injection on to find those allocation issues.
> > > If syzbot reports will start coming as tainted there will be even less
> > > attention to them. That will not be good.
> >
> > Hmm, what kind of error injection does syzbot do? I would like to know
> > how it is used. For example, does that use only a specify set of
> > injection points, or use all existing points?
> >
> > If the latter, I feel safer because syzbot ensures the current all
> > ALLOW_ERROR_INJECTION() functions will work with error injection. If not,
> > we need to consider removing the ALLOW_ERROR_INJECTION() from the
> > function which is not tested well (or add this taint flag.)
> >
> > Documentation/fault-injection/fault-injection.rst has no explanation
> > about ALLOW_ERROR_INJECTION(), but obviously the ALLOW_ERROR_INJECTION()
> > marked functions and its caller MUST be designed safely against the
> > error injection. e.g.
> >
> > - It must return an error code. (so EI_ETYPE_NONE must be removed)
> 
> This is already the case with BPF, the modify return trampolines
> further limits the error injection to functions that return errors.

OK, so I also should remove it from FEI.

> 
> > - Caller must check the return value always.
> >   (but I thought this was the reason why we need this test framework...)
> > - It should not run any 'effective' code before checking an error.
> >   For example, increment counter, call other functions etc.
> >   (this means it can return without any side-effect)
> 
> This is the case with modify_return trampolines in BPF which avoid
> side effects and limit the attachment surface further and avoiding
> side effects is a design goal. If we missed anything, let's fix that.
> 
> https://lwn.net/Articles/813724/

Yeah, if BPF tests already tested all ALLOW_ERROR_INJECTION() functions,
it may be checked already. I think we just need adding the above
explanation on the document, so that the people who will add additional
ALLOW_ERROR_INJECTION() on a function, can understand the limitation.

> 
> >
> > Anything else?
> >
> > [...]
> > > All these years we've been working on improving bpf introspection and
> > > debuggability. Today crash dumps look like this:
> > >   bpf_trace_printk+0xd3/0x170 kernel/trace/bpf_trace.c:377
> > >   bpf_prog_cf2ac6d483d8499b_trace_bpf_trace_printk+0x2b/0x37
> > >   bpf_dispatcher_nop_func include/linux/bpf.h:1082 [inline]
> > >   __bpf_prog_run include/linux/filter.h:600 [inline]
> > >   bpf_prog_run include/linux/filter.h:607 [inline]
> > >
> > > The 2nd from the top is a bpf prog. The rest are kernel functions.
> > > bpf_prog_cf2ac6d483d8499b_trace_bpf_trace_printk
> > >          ^^ is a prog tag   ^^ name of bpf prog
> > >
> > > If you do 'bpftool prog show' you can see both tag and name.
> > > 'bpftool prog dump jited'
> > > dumps x86 code mixed with source line text.
> > > Often enough +0x2b offset will have some C code right next to it.
> >
> > This is good, but this only works when the vmcore is dumped and
> > on the stack. My concern about the function error injection is
> > that makes some side effects, which can cause a problem afterwards
> > (this means after unloading the bpf prog)
> 
> I think careful choices need to be made on when error injection is
> allowed so that these situations don't occur. (as you mentioned in
> your comment). [1]. If a BPF program is unloaded, there is no error
> injection any more, let's ensure that we design the error injection
> allow list and the BPF logic to ensure this cannot happen.

OK. Actually, I trust the BPF logic itself will be handle this
correctly. I just concerned that some people who don't know much
(because it is not carefully documented) might add ALLOW_ERROR_INJECTION()
to a function which is not injectable by design. Thus I thought the
taint flag can help.
But if those are always Cc'd to bpf@vger and it will be tested by BPF
CI, I'm OK for that.

> > > One can monitor all prog load/unload via perf or via audit.
> 
> I would like us to focus on debuggability as it helps both the
> maintainers and the user. And I see a few things that need to be done:
> 
> 1. Revisit what is allowed for error injection in the kernel and if
> they can cause any subtle issues. My initial take is that functions
> that are directly called from syscall path should generally be okay.
> But let's check them for the patterns you mentioned.

Yeah, I agree that syscall entries should be safe.

> 2. If it helps, add the list of BPF modify return programs to stack
> traces. Although this is really needed if we don't do [1] properly.

Would you mean a list of enabled BPF programs in Oops code? If so,
I also want to add enabled FEI list on it.

> 3. Check if anything needs to be improved in the verification logic
> for modify return trampolines.

I think BPF logic itself is safe. But the targeted function itself
or the caller may not be designed for such error injection.
I think this is my fault that I have not documented about
ALLOW_ERROR_INJECTION() well. Sorry about that.

Thank you,
  
Steven Rostedt Dec. 11, 2022, 5:02 p.m. UTC | #16
On Sun, 11 Dec 2022 08:49:01 +0100
KP Singh <kpsingh@kernel.org> wrote:

> Let's take a step back and focus on solving debuggability and
> introspection as we clearly have some perception issues about taints
> in the community. (distro maintainers, users) before we go and add
> more taints.

Note, you will likely get the same push back if the dump includes bpf
programs known to change the return of a function that may be involved
with the bug report. That is, if a crash is reported to code I
maintain, and I see that the bug report includes a list of BPF programs
that can modify the return of a function, and one of those functions
could affect the place that crashed, I'd push back and ask if the crash
could be done without that BPF program loaded, regardless of taints.

I agree that a taint is just a hint and it can include something that
caused the bug or it may not. I would like to see more details in how
the crashed kernel was configured. That includes loaded BPF programs
(just like we include loaded modules). And if any BPF program modifies
a core function (outside of syscall returns) I'd be a bit suspect of
what happened.

I also agree that if a function that checks error paths fails, it
should be fixed, but knowing that the error path was caused by fault
injection will prevent the wasted effort that most developers will go
through to find out why the error path was hit in the first place.

-- Steve
  
Masami Hiramatsu (Google) Dec. 12, 2022, 12:53 a.m. UTC | #17
Hi,

On Sun, 11 Dec 2022 08:49:01 +0100
KP Singh <kpsingh@kernel.org> wrote:

> 1. Revisit what is allowed for error injection in the kernel and if
> they can cause any subtle issues. My initial take is that functions
> that are directly called from syscall path should generally be okay.
> But let's check them for the patterns you mentioned.
> 2. If it helps, add the list of BPF modify return programs to stack
> traces. Although this is really needed if we don't do [1] properly.
> 3. Check if anything needs to be improved in the verification logic
> for modify return trampolines.

Hmm, I found that bpf might not check the acceptable error type of
each ALLOW_ERROR_INJECTION().

Except for EI_ETYPE_NONE, we have 4 types of the error.

        EI_ETYPE_NULL,          /* Return NULL if failure */
        EI_ETYPE_ERRNO,         /* Return -ERRNO if failure */
        EI_ETYPE_ERRNO_NULL,    /* Return -ERRNO or NULL if failure */
        EI_ETYPE_TRUE,          /* Return true if failure */

These specifies that what return value will be treated as an error
by the caller.

If bpf trampoline only expect that the function will return -errno
in error cases, bpf should check the error type as below.

etype = get_injectable_error_type(addr);
if (etype != EI_ETYPE_ERRNO && etype != EI_ETYPE_ERRNO_NULL)
	/* reject it */

If bpf can handle any case, it still need to verify that the user
bpf prog specifies correct return value for each type.
See adjust_error_retval()@kernel/fail_function.c for the available
return values.

Thank you,
  
KP Singh Dec. 12, 2022, 9:39 p.m. UTC | #18
On Sun, Dec 11, 2022 at 4:14 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
>
> On Sun, 11 Dec 2022 08:49:01 +0100
> KP Singh <kpsingh@kernel.org> wrote:
>
> > On Sun, Dec 11, 2022 at 3:52 AM Masami Hiramatsu <mhiramat@kernel.org> wrote:
> > >
> > > Hi Alexei,
> > >
> > > On Wed, 7 Dec 2022 20:36:28 -0800
> > > Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> > >
> > > > Yet for 2 days this 'taint' arguing is preventing people from looking at the bug.
> > > > And that happens all the time on lkml. Somebody reports a bug and kernel devs
> > > > jump on the poor person:
> > > > "Can you repro without taint?",
> > > > "Can you repro with upstream kernel?"
> > > > This is discouraging.
> > > > The 'taint' concept makes it easier for kernel devs to ignore bug reports
> > > > and push back on the reporter.
> > > > Do it few times and people stop reporting bugs.
> > >
> > > That seems off topic for me. You seems complained against the taint flag
> > > itself.
> >
> > The series is about adding a taint for, so discussing the user
> > experience, when someone reports a "tainted crash" seems reasonable to
> > me and not off topic.
> >
> > >
> > > > Say, this particular bug in rethook was found by one of our BPF CI developers.
> > > > They're not very familiar with the kernel, but they can see plenty of 'rethook'
> > > > references in the stack trace, lookup MAINTAINER file and ping Massami,
> > > > but to the question "can you repro without taint?" they can only say NO,
> > > > because this is how our CI works. So they will keep silence and the bug will be lost.
> > >
> > > BTW, this sounds like the BPF CI system design issue. If user is NOT easily
> > > identifying what test caused the issue (e.g. what tests ran on the system
> > > until the bug was found), the CI system is totally useless, because after
> > > finding a problem, it must be investigated to solve the problem.
> > >
> > > Without investigation, how would you usually fix the bug??
> >
> > Masami, this seems accusational and counter productive, it was never
> > said that issues can be solved without investigation.
>
> Let me apologies about my misunderstanding.

e-mail is hard, I am glad this is already progressing constructively
and we are making things better.

>
> >
> > The BPF CI does find issues, the BPF reviewers and maintainers
> > regularly fix bugs using it. Alexei's point here is that a taint does
> > not help in solving the problem, rather deter some people from even
> > looking at it. (not BPF people, but other maintainers [distro, kernel]
> > who would ask for a reproduction without a taint).
>
> Hmm, that is a problem. Some taint flag should be useful hints
> for finding the error patterns.

the boolean information contained in taints is not helpful, stuff like
audit logs / crash dumps / bpftool reports are much more helpful.
>
> > Let's take a step back and focus on solving debuggability and
> > introspection as we clearly have some perception issues about taints
> > in the community. (distro maintainers, users) before we go and add
> > more taints.
>
> Agreed.
>
> > > > That's not the only reason why I'm against generalizing 'taint'.
> > > > Tainting because HW is misbehaving makes sense, but tainting because
> > > > of OoO module or because of live-patching does not.
> > > > It becomes an excuse that people abuse.
> > >
> > > yeah, it is possible to be abused. but that is the problem who
> > > abuse it.
> >
> > I am sorry, but it's our responsibility as developers to design
> > features so that users don't face arduous pushbacks.
>
> Sorry if I confuse you. I meant that taint flag abusing. :(

Again no worries, but we need to make sure that the taint flag is not
abused, and if it is being abused, limit the damage somewhere.

>
> >
> > > > Right now syzbot is finding all sorts of bugs. Most of the time syzbot
> > > > turns error injection on to find those allocation issues.
> > > > If syzbot reports will start coming as tainted there will be even less
> > > > attention to them. That will not be good.
> > >
> > > Hmm, what kind of error injection does syzbot do? I would like to know
> > > how it is used. For example, does that use only a specify set of
> > > injection points, or use all existing points?
> > >
> > > If the latter, I feel safer because syzbot ensures the current all
> > > ALLOW_ERROR_INJECTION() functions will work with error injection. If not,
> > > we need to consider removing the ALLOW_ERROR_INJECTION() from the
> > > function which is not tested well (or add this taint flag.)
> > >
> > > Documentation/fault-injection/fault-injection.rst has no explanation
> > > about ALLOW_ERROR_INJECTION(), but obviously the ALLOW_ERROR_INJECTION()
> > > marked functions and its caller MUST be designed safely against the
> > > error injection. e.g.
> > >
> > > - It must return an error code. (so EI_ETYPE_NONE must be removed)
> >
> > This is already the case with BPF, the modify return trampolines
> > further limits the error injection to functions that return errors.
>
> OK, so I also should remove it from FEI.
>
> >
> > > - Caller must check the return value always.
> > >   (but I thought this was the reason why we need this test framework...)
> > > - It should not run any 'effective' code before checking an error.
> > >   For example, increment counter, call other functions etc.
> > >   (this means it can return without any side-effect)
> >
> > This is the case with modify_return trampolines in BPF which avoid
> > side effects and limit the attachment surface further and avoiding
> > side effects is a design goal. If we missed anything, let's fix that.
> >
> > https://lwn.net/Articles/813724/
>
> Yeah, if BPF tests already tested all ALLOW_ERROR_INJECTION() functions,
> it may be checked already. I think we just need adding the above
> explanation on the document, so that the people who will add additional
> ALLOW_ERROR_INJECTION() on a function, can understand the limitation.
>
> >
> > >
> > > Anything else?
> > >
> > > [...]
> > > > All these years we've been working on improving bpf introspection and
> > > > debuggability. Today crash dumps look like this:
> > > >   bpf_trace_printk+0xd3/0x170 kernel/trace/bpf_trace.c:377
> > > >   bpf_prog_cf2ac6d483d8499b_trace_bpf_trace_printk+0x2b/0x37
> > > >   bpf_dispatcher_nop_func include/linux/bpf.h:1082 [inline]
> > > >   __bpf_prog_run include/linux/filter.h:600 [inline]
> > > >   bpf_prog_run include/linux/filter.h:607 [inline]
> > > >
> > > > The 2nd from the top is a bpf prog. The rest are kernel functions.
> > > > bpf_prog_cf2ac6d483d8499b_trace_bpf_trace_printk
> > > >          ^^ is a prog tag   ^^ name of bpf prog
> > > >
> > > > If you do 'bpftool prog show' you can see both tag and name.
> > > > 'bpftool prog dump jited'
> > > > dumps x86 code mixed with source line text.
> > > > Often enough +0x2b offset will have some C code right next to it.
> > >
> > > This is good, but this only works when the vmcore is dumped and
> > > on the stack. My concern about the function error injection is
> > > that makes some side effects, which can cause a problem afterwards
> > > (this means after unloading the bpf prog)
> >
> > I think careful choices need to be made on when error injection is
> > allowed so that these situations don't occur. (as you mentioned in
> > your comment). [1]. If a BPF program is unloaded, there is no error
> > injection any more, let's ensure that we design the error injection
> > allow list and the BPF logic to ensure this cannot happen.
>
> OK. Actually, I trust the BPF logic itself will be handle this
> correctly. I just concerned that some people who don't know much
> (because it is not carefully documented) might add ALLOW_ERROR_INJECTION()
> to a function which is not injectable by design. Thus I thought the
> taint flag can help.

We should definitely carefully review any new ALLOW_ERROR_INJECTION
functions, that's the real value add.

> But if those are always Cc'd to bpf@vger and it will be tested by BPF
> CI, I'm OK for that.
>
> > > > One can monitor all prog load/unload via perf or via audit.
> >
> > I would like us to focus on debuggability as it helps both the
> > maintainers and the user. And I see a few things that need to be done:
> >
> > 1. Revisit what is allowed for error injection in the kernel and if
> > they can cause any subtle issues. My initial take is that functions
> > that are directly called from syscall path should generally be okay.
> > But let's check them for the patterns you mentioned.
>
> Yeah, I agree that syscall entries should be safe.
>
> > 2. If it helps, add the list of BPF modify return programs to stack
> > traces. Although this is really needed if we don't do [1] properly.
>
> Would you mean a list of enabled BPF programs in Oops code? If so,
> I also want to add enabled FEI list on it.
>
> > 3. Check if anything needs to be improved in the verification logic
> > for modify return trampolines.
>
> I think BPF logic itself is safe. But the targeted function itself
> or the caller may not be designed for such error injection.
> I think this is my fault that I have not documented about
> ALLOW_ERROR_INJECTION() well. Sorry about that.

We aim to be blameless and constructive :)

Thanks again for sending the patches!

Cheers,
- KP

>
> Thank you,
>
>
> --
> Masami Hiramatsu (Google) <mhiramat@kernel.org>
  
KP Singh Dec. 12, 2022, 9:43 p.m. UTC | #19
On Sun, Dec 11, 2022 at 6:02 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Sun, 11 Dec 2022 08:49:01 +0100
> KP Singh <kpsingh@kernel.org> wrote:
>
> > Let's take a step back and focus on solving debuggability and
> > introspection as we clearly have some perception issues about taints
> > in the community. (distro maintainers, users) before we go and add
> > more taints.
>
> Note, you will likely get the same push back if the dump includes bpf
> programs known to change the return of a function that may be involved
> with the bug report. That is, if a crash is reported to code I
> maintain, and I see that the bug report includes a list of BPF programs
> that can modify the return of a function, and one of those functions
> could affect the place that crashed, I'd push back and ask if the crash
> could be done without that BPF program loaded, regardless of taints.
>

I think this is already better as it gives the recipient of the stack
trace more information to ask more questions and see if the BPF
programs are relevant to the stack trace and engage further.

> I agree that a taint is just a hint and it can include something that
> caused the bug or it may not. I would like to see more details in how
> the crashed kernel was configured. That includes loaded BPF programs
> (just like we include loaded modules). And if any BPF program modifies
> a core function (outside of syscall returns) I'd be a bit suspect of
> what happened.

Agreed.

>
> I also agree that if a function that checks error paths fails, it
> should be fixed, but knowing that the error path was caused by fault
> injection will prevent the wasted effort that most developers will go
> through to find out why the error path was hit in the first place.

Agreed.

>
> -- Steve
  

Patch

diff --git a/Documentation/admin-guide/tainted-kernels.rst b/Documentation/admin-guide/tainted-kernels.rst
index 92a8a07f5c43..63d7cd4f6250 100644
--- a/Documentation/admin-guide/tainted-kernels.rst
+++ b/Documentation/admin-guide/tainted-kernels.rst
@@ -101,6 +101,7 @@  Bit  Log  Number  Reason that got the kernel tainted
  16  _/X   65536  auxiliary taint, defined for and used by distros
  17  _/T  131072  kernel was built with the struct randomization plugin
  18  _/N  262144  an in-kernel test has been run
+ 19  _/J  524288  a function-level error has been injected
 ===  ===  ======  ========================================================
 
 Note: The character ``_`` is representing a blank in this table to make reading
@@ -182,3 +183,7 @@  More detailed explanation for tainting
      produce extremely unusual kernel structure layouts (even performance
      pathological ones), which is important to know when debugging. Set at
      build time.
+
+ 19) ``J`` if a function-level error has been injected and the code path was
+     forcibly changed by either function error injection framework or BPF's
+     function override feature.
diff --git a/include/linux/panic.h b/include/linux/panic.h
index c7759b3f2045..2b03a02d86be 100644
--- a/include/linux/panic.h
+++ b/include/linux/panic.h
@@ -69,7 +69,8 @@  static inline void set_arch_panic_timeout(int timeout, int arch_default_timeout)
 #define TAINT_AUX			16
 #define TAINT_RANDSTRUCT		17
 #define TAINT_TEST			18
-#define TAINT_FLAGS_COUNT		19
+#define TAINT_FAULT_INJECTED		19
+#define TAINT_FLAGS_COUNT		20
 #define TAINT_FLAGS_MAX			((1UL << TAINT_FLAGS_COUNT) - 1)
 
 struct taint_flag {
diff --git a/kernel/fail_function.c b/kernel/fail_function.c
index a7ccd2930c5f..80a743f14a2c 100644
--- a/kernel/fail_function.c
+++ b/kernel/fail_function.c
@@ -9,6 +9,7 @@ 
 #include <linux/kprobes.h>
 #include <linux/module.h>
 #include <linux/mutex.h>
+#include <linux/panic.h>
 #include <linux/slab.h>
 #include <linux/uaccess.h>
 
@@ -298,6 +299,7 @@  static ssize_t fei_write(struct file *file, const char __user *buffer,
 		fei_attr_free(attr);
 		goto out;
 	}
+	add_taint(TAINT_FAULT_INJECTED, LOCKDEP_NOW_UNRELIABLE);
 	fei_debugfs_add_attr(attr);
 	list_add_tail(&attr->list, &fei_attr_list);
 	ret = count;
diff --git a/kernel/panic.c b/kernel/panic.c
index da323209f583..e396a5fd9bb6 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -426,6 +426,7 @@  const struct taint_flag taint_flags[TAINT_FLAGS_COUNT] = {
 	[ TAINT_AUX ]			= { 'X', ' ', true },
 	[ TAINT_RANDSTRUCT ]		= { 'T', ' ', true },
 	[ TAINT_TEST ]			= { 'N', ' ', true },
+	[ TAINT_FAULT_INJECTED ]	= { 'J', ' ', false },
 };
 
 /**
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 1ed08967fb97..de0614d9796c 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -2137,6 +2137,8 @@  int perf_event_attach_bpf_prog(struct perf_event *event,
 		goto unlock;
 
 	/* set the new array to event->tp_event and set event->prog */
+	if (prog->kprobe_override)
+		add_taint(TAINT_FAULT_INJECTED, LOCKDEP_NOW_UNRELIABLE);
 	event->prog = prog;
 	event->bpf_cookie = bpf_cookie;
 	rcu_assign_pointer(event->tp_event->prog_array, new_array);
diff --git a/tools/debugging/kernel-chktaint b/tools/debugging/kernel-chktaint
index 279be06332be..5eb563766041 100755
--- a/tools/debugging/kernel-chktaint
+++ b/tools/debugging/kernel-chktaint
@@ -204,6 +204,14 @@  else
 	echo " * an in-kernel test (such as a KUnit test) has been run (#18)"
 fi
 
+T=`expr $T / 2`
+if [ `expr $T % 2` -eq 0 ]; then
+	addout " "
+else
+	addout "J"
+	echo " * a function level error has been injected (#19)"
+fi
+
 echo "For a more detailed explanation of the various taint flags see"
 echo " Documentation/admin-guide/tainted-kernels.rst in the Linux kernel sources"
 echo " or https://kernel.org/doc/html/latest/admin-guide/tainted-kernels.html"