diff mbox series

kexec: Fix reboot race during device_shutdown()

Message ID	20230929021213.2364883-1-joel@joelfernandes.org
State	New
Headers	Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) client-ip=23.128.96.33; From: "Joel Fernandes (Google)" <joel@joelfernandes.org> To: linux-kernel@vger.kernel.org, Eric Biederman <ebiederm@xmission.com> Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>, Steven Rostedt <rostedt@goodmis.org>, Ricardo Ribalda <ribalda@google.com>, Ross Zwisler <zwisler@google.com>, Rob Clark <robdclark@gmail.com>, Linus Torvalds <torvalds@linux-foundation.org>, kexec@lists.infradead.org Subject: [PATCH] kexec: Fix reboot race during device_shutdown() Date: Fri, 29 Sep 2023 02:12:12 +0000 Message-ID: <20230929021213.2364883-1-joel@joelfernandes.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	kexec: Fix reboot race during device_shutdown() \| kexec: Fix reboot race during device_shutdown()

Commit Message

Joel Fernandes Sept. 29, 2023, 2:12 a.m. UTC

  During kexec reboot, it is possible for a race to occur between
device_shutdown() and userspace.  This causes accesses to GPU after pm_runtime
suspend has already happened. Fix this by calling freeze_processes() before
device_shutdown().

Such freezing is already being done if kernel supports KEXEC_JUMP and
kexec_image->preserve_context is true. However, doing it if either of these are
not true prevents crashes/races.

This fixes the following crash during kexec reboot on an ARM64 device
with adreno GPU:

[  292.534314] Kernel panic - not syncing: Asynchronous SError Interrupt
[  292.534323] Hardware name: Google Lazor (rev3 - 8) with LTE (DT)
[  292.534326] Call trace:
[  292.534328]  dump_backtrace+0x0/0x1d4
[  292.534337]  show_stack+0x20/0x2c
[  292.534342]  dump_stack_lvl+0x60/0x78
[  292.534347]  dump_stack+0x18/0x38
[  292.534352]  panic+0x148/0x3b0
[  292.534357]  nmi_panic+0x80/0x94
[  292.534364]  arm64_serror_panic+0x70/0x7c
[  292.534369]  do_serror+0x0/0x7c
[  292.534372]  do_serror+0x54/0x7c
[  292.534377]  el1h_64_error_handler+0x34/0x4c
[  292.534381]  el1h_64_error+0x7c/0x80
[  292.534386]  el1_interrupt+0x20/0x58
[  292.534389]  el1h_64_irq_handler+0x18/0x24
[  292.534395]  el1h_64_irq+0x7c/0x80
[  292.534399]  local_daif_inherit+0x10/0x18
[  292.534405]  el1h_64_sync_handler+0x48/0xb4
[  292.534410]  el1h_64_sync+0x7c/0x80
[  292.534414]  a6xx_gmu_set_oob+0xbc/0x1fc
[  292.534422]  a6xx_get_timestamp+0x40/0xb4
[  292.534426]  adreno_get_param+0x12c/0x1e0
[  292.534433]  msm_ioctl_get_param+0x64/0x70
[  292.534440]  drm_ioctl_kernel+0xe8/0x158
[  292.534448]  drm_ioctl+0x208/0x320
[  292.534453]  __arm64_sys_ioctl+0x98/0xd0
[  292.534461]  invoke_syscall+0x4c/0x118

Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ricardo Ribalda <ribalda@google.com>
Cc: Ross Zwisler <zwisler@google.com>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Tested-by: Ricardo Ribalda <ribalda@google.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/kexec_core.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Eric W. Biederman Sept. 29, 2023, 4 p.m. UTC | #1

"Joel Fernandes (Google)" <joel@joelfernandes.org> writes:

> During kexec reboot, it is possible for a race to occur between
> device_shutdown() and userspace.  This causes accesses to GPU after pm_runtime
> suspend has already happened. Fix this by calling freeze_processes() before
> device_shutdown().

Is there any reason why this same race with between sys_kexec and the
adreno_ioctl can not happen during a normal reboot?

Is there any reason why there is not a .shutdown method to prevent the
race?

I would think the thing to do is to prevent this race in
kernel_restart_prepare or in the GPUs .shutdown method.  As I don't see
anything that would prevent this during a normal reboot.

>
> Such freezing is already being done if kernel supports KEXEC_JUMP and
> kexec_image->preserve_context is true. However, doing it if either of these are
> not true prevents crashes/races.

The KEXEC_JUMP case is something else entirely.  It is supposed to work
like suspend to RAM.  Maybe reboot should as well, but I am
uncomfortable making a generic device fix kexec specific.


> This fixes the following crash during kexec reboot on an ARM64 device
> with adreno GPU:
>
> [  292.534314] Kernel panic - not syncing: Asynchronous SError Interrupt
> [  292.534323] Hardware name: Google Lazor (rev3 - 8) with LTE (DT)
> [  292.534326] Call trace:
> [  292.534328]  dump_backtrace+0x0/0x1d4
> [  292.534337]  show_stack+0x20/0x2c
> [  292.534342]  dump_stack_lvl+0x60/0x78
> [  292.534347]  dump_stack+0x18/0x38
> [  292.534352]  panic+0x148/0x3b0
> [  292.534357]  nmi_panic+0x80/0x94
> [  292.534364]  arm64_serror_panic+0x70/0x7c
> [  292.534369]  do_serror+0x0/0x7c
> [  292.534372]  do_serror+0x54/0x7c
> [  292.534377]  el1h_64_error_handler+0x34/0x4c
> [  292.534381]  el1h_64_error+0x7c/0x80
> [  292.534386]  el1_interrupt+0x20/0x58
> [  292.534389]  el1h_64_irq_handler+0x18/0x24
> [  292.534395]  el1h_64_irq+0x7c/0x80
> [  292.534399]  local_daif_inherit+0x10/0x18
> [  292.534405]  el1h_64_sync_handler+0x48/0xb4
> [  292.534410]  el1h_64_sync+0x7c/0x80
> [  292.534414]  a6xx_gmu_set_oob+0xbc/0x1fc
> [  292.534422]  a6xx_get_timestamp+0x40/0xb4
> [  292.534426]  adreno_get_param+0x12c/0x1e0
> [  292.534433]  msm_ioctl_get_param+0x64/0x70
> [  292.534440]  drm_ioctl_kernel+0xe8/0x158
> [  292.534448]  drm_ioctl+0x208/0x320
> [  292.534453]  __arm64_sys_ioctl+0x98/0xd0
> [  292.534461]  invoke_syscall+0x4c/0x118
>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Ricardo Ribalda <ribalda@google.com>
> Cc: Ross Zwisler <zwisler@google.com>
> Cc: Rob Clark <robdclark@gmail.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Tested-by: Ricardo Ribalda <ribalda@google.com>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  kernel/kexec_core.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index e2f2574d8b74..6599f485e42d 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -1299,6 +1299,12 @@ int kernel_kexec(void)
>  	} else
>  #endif
>  	{
> +		error = freeze_processes();
> +		if (error) {
> +			error = -EBUSY;
> +			goto Unlock;
> +		}
> +
>  		kexec_in_progress = true;
>  		kernel_restart_prepare("kexec reboot");
>  		migrate_to_reboot_cpu();

Eric

Joel Fernandes Oct. 2, 2023, 6:18 p.m. UTC | #2

Hi Eric,

On Fri, Sep 29, 2023 at 12:01 PM Eric W. Biederman
<ebiederm@xmission.com> wrote:
>
> "Joel Fernandes (Google)" <joel@joelfernandes.org> writes:
>
> > During kexec reboot, it is possible for a race to occur between
> > device_shutdown() and userspace.  This causes accesses to GPU after pm_runtime
> > suspend has already happened. Fix this by calling freeze_processes() before
> > device_shutdown().
>
> Is there any reason why this same race with between sys_kexec and the
> adreno_ioctl can not happen during a normal reboot?

Thanks for the response. It can happen during a normal reboot. I think
the reason it does not show up in the wild is because the "reboot"
command implementation typically sends one of SIGSTOP or SIGKILL to
all processes which effectively prevents the race.

In any case, there is also a school of thought that says the kernel
should be resilient to crashes and a userspace workaround involving
sending signals could be looked at as papering over the real issue. I
do sympathize/agree with that school of thought as well.

> Is there any reason why there is not a .shutdown method to prevent the
> race?
> I would think the thing to do is to prevent this race in
> kernel_restart_prepare or in the GPUs .shutdown method.  As I don't see
> anything that would prevent this during a normal reboot.

What you're saying is essentially what I remember trying, the issue is
not in the GPU driver but rather there the interconnect in the SoC
shutdown and causes an "SError" exception if the CPU tries to access
the memory locations, as also seen in the stack. I was not able to
trace exactly when the interconnect becomes unavailable and perhaps
there is a possibility of a more intricate fix where we can signal the
GPU driver to not access the bus anymore, but my suspicion is that
will add a lot of complexity and perhaps leave the door open to
similar issues.

> > Such freezing is already being done if kernel supports KEXEC_JUMP and
> > kexec_image->preserve_context is true. However, doing it if either of these are
> > not true prevents crashes/races.
>
> The KEXEC_JUMP case is something else entirely.  It is supposed to work
> like suspend to RAM.  Maybe reboot should as well, but I am
> uncomfortable making a generic device fix kexec specific.

I see your point of view. I think regular reboot should also be fixed
to avoid similar crash possibilities. I am happy to make a change for
that similar to this patch if we want to proceed that way.

Thoughts?

thanks,

 - Joel

Joel Fernandes Oct. 8, 2023, 1:30 a.m. UTC | #3

On Mon, Oct 2, 2023 at 2:18 PM Joel Fernandes <joel@joelfernandes.org> wrote:
[..]
> > > Such freezing is already being done if kernel supports KEXEC_JUMP and
> > > kexec_image->preserve_context is true. However, doing it if either of these are
> > > not true prevents crashes/races.
> >
> > The KEXEC_JUMP case is something else entirely.  It is supposed to work
> > like suspend to RAM.  Maybe reboot should as well, but I am
> > uncomfortable making a generic device fix kexec specific.
>
> I see your point of view. I think regular reboot should also be fixed
> to avoid similar crash possibilities. I am happy to make a change for
> that similar to this patch if we want to proceed that way.
>
> Thoughts?

Just checking how we want to proceed, is the consensus that we should
prevent kernel crashes without relying on userspace stopping all
processes? Should we fix regular reboot syscall as well and not just
kexec reboot?

thanks,

 - Joel

Steven Rostedt Oct. 9, 2023, 2:01 p.m. UTC | #4

On Sat, 7 Oct 2023 21:30:42 -0400
Joel Fernandes <joel@joelfernandes.org> wrote:

> Just checking how we want to proceed, is the consensus that we should
> prevent kernel crashes without relying on userspace stopping all
> processes? Should we fix regular reboot syscall as well and not just
> kexec reboot?

If you can show that we can trigger the crash on normal reboot, then I
don't see why not. That is, if you have a program that does the reboot
(without the SIGSTOP/SIGKILL calls) and triggers this crash, I think that's
a legitimate reason to fix it on normal reboot too.

-- Steve

Eric W. Biederman Oct. 9, 2023, 3:21 p.m. UTC | #5

Joel Fernandes <joel@joelfernandes.org> writes:

> On Mon, Oct 2, 2023 at 2:18 PM Joel Fernandes <joel@joelfernandes.org> wrote:
> [..]
>> > > Such freezing is already being done if kernel supports KEXEC_JUMP and
>> > > kexec_image->preserve_context is true. However, doing it if either of these are
>> > > not true prevents crashes/races.
>> >
>> > The KEXEC_JUMP case is something else entirely.  It is supposed to work
>> > like suspend to RAM.  Maybe reboot should as well, but I am
>> > uncomfortable making a generic device fix kexec specific.
>>
>> I see your point of view. I think regular reboot should also be fixed
>> to avoid similar crash possibilities. I am happy to make a change for
>> that similar to this patch if we want to proceed that way.
>>
>> Thoughts?
>
> Just checking how we want to proceed, is the consensus that we should
> prevent kernel crashes without relying on userspace stopping all
> processes? Should we fix regular reboot syscall as well and not just
> kexec reboot?

It just occurred to me there is something very fishy about all of this.

What userspace do you have using kexec (not kexec on panic) that doesn't
preform the same userspace shutdown as a normal reboot?

Quite frankly such a userspace is buggy, and arguably that is where you
should start fixing things.  That way you can get the orderly shutdown
of userspace daemons/services along with an orderly shutdown of
everything the kernel is responsible for.

At the kernel level a kexec reboot and a normal reboot have been
deliberately kept as close as possible.  Which is why I say we should
fix it in reboot.

Eric

Joel Fernandes Oct. 10, 2023, 8:37 p.m. UTC | #6

On Mon, Oct 9, 2023 at 11:30 AM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> Joel Fernandes <joel@joelfernandes.org> writes:
>
> > On Mon, Oct 2, 2023 at 2:18 PM Joel Fernandes <joel@joelfernandes.org> wrote:
> > [..]
> >> > > Such freezing is already being done if kernel supports KEXEC_JUMP and
> >> > > kexec_image->preserve_context is true. However, doing it if either of these are
> >> > > not true prevents crashes/races.
> >> >
> >> > The KEXEC_JUMP case is something else entirely.  It is supposed to work
> >> > like suspend to RAM.  Maybe reboot should as well, but I am
> >> > uncomfortable making a generic device fix kexec specific.
> >>
> >> I see your point of view. I think regular reboot should also be fixed
> >> to avoid similar crash possibilities. I am happy to make a change for
> >> that similar to this patch if we want to proceed that way.
> >>
> >> Thoughts?
> >
> > Just checking how we want to proceed, is the consensus that we should
> > prevent kernel crashes without relying on userspace stopping all
> > processes? Should we fix regular reboot syscall as well and not just
> > kexec reboot?
>
> It just occurred to me there is something very fishy about all of this.
>
> What userspace do you have using kexec (not kexec on panic) that doesn't
> preform the same userspace shutdown as a normal reboot?
>
> Quite frankly such a userspace is buggy, and arguably that is where you
> should start fixing things.

It is a simple unit test that tests kexec support by kexec-rebooting
the kernel. I don't think SIGSTOP/SIGKILL'ing during kexec-reboot is
ideal because in a real panic-on-kexec type crash, that may not happen
and so does not emulate the real world that well. I think we want the
kexec-reboot to do a *reboot* without crashing the kernel while doing
so. Ricardo/Steve can chime on what they feel as well.

> That way you can get the orderly shutdown
> of userspace daemons/services along with an orderly shutdown of
> everything the kernel is responsible for.

Fixing in userspace is an option but people are not happy that the
kernel can crash like that.

> At the kernel level a kexec reboot and a normal reboot have been
> deliberately kept as close as possible.  Which is why I say we should
> fix it in reboot.

You mean fix it in userspace?

thanks,

 - Joel

Joel Fernandes Oct. 10, 2023, 8:43 p.m. UTC | #7

On Mon, Oct 9, 2023 at 10:00 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Sat, 7 Oct 2023 21:30:42 -0400
> Joel Fernandes <joel@joelfernandes.org> wrote:
>
> > Just checking how we want to proceed, is the consensus that we should
> > prevent kernel crashes without relying on userspace stopping all
> > processes? Should we fix regular reboot syscall as well and not just
> > kexec reboot?
>
> If you can show that we can trigger the crash on normal reboot, then I
> don't see why not. That is, if you have a program that does the reboot
> (without the SIGSTOP/SIGKILL calls) and triggers this crash, I think that's
> a legitimate reason to fix it on normal reboot too.

Ok, Sounds good, thanks for sharing your thoughts.

 - Joel

Eric W. Biederman Oct. 10, 2023, 9:07 p.m. UTC | #8

Joel Fernandes <joel@joelfernandes.org> writes:

> On Mon, Oct 9, 2023 at 11:30 AM Eric W. Biederman <ebiederm@xmission.com> wrote:
>>
>> Joel Fernandes <joel@joelfernandes.org> writes:
>>
>> > On Mon, Oct 2, 2023 at 2:18 PM Joel Fernandes <joel@joelfernandes.org> wrote:
>> > [..]
>> >> > > Such freezing is already being done if kernel supports KEXEC_JUMP and
>> >> > > kexec_image->preserve_context is true. However, doing it if either of these are
>> >> > > not true prevents crashes/races.
>> >> >
>> >> > The KEXEC_JUMP case is something else entirely.  It is supposed to work
>> >> > like suspend to RAM.  Maybe reboot should as well, but I am
>> >> > uncomfortable making a generic device fix kexec specific.
>> >>
>> >> I see your point of view. I think regular reboot should also be fixed
>> >> to avoid similar crash possibilities. I am happy to make a change for
>> >> that similar to this patch if we want to proceed that way.
>> >>
>> >> Thoughts?
>> >
>> > Just checking how we want to proceed, is the consensus that we should
>> > prevent kernel crashes without relying on userspace stopping all
>> > processes? Should we fix regular reboot syscall as well and not just
>> > kexec reboot?
>>
>> It just occurred to me there is something very fishy about all of this.
>>
>> What userspace do you have using kexec (not kexec on panic) that doesn't
>> preform the same userspace shutdown as a normal reboot?
>>
>> Quite frankly such a userspace is buggy, and arguably that is where you
>> should start fixing things.
>
> It is a simple unit test that tests kexec support by kexec-rebooting
> the kernel. I don't think SIGSTOP/SIGKILL'ing during kexec-reboot is
> ideal because in a real panic-on-kexec type crash, that may not happen
> and so does not emulate the real world that well. I think we want the
> kexec-reboot to do a *reboot* without crashing the kernel while doing
> so. Ricardo/Steve can chime on what they feel as well.

This is confusing.  You have a unit test that, that tests kexec on
panic using a the full kexec reboot.

The two are fundamentally similar but you aren't going to have a valid
test case if you mix them.

There is a whole kernel module that tests more interesting cases,
for the simple case you probably just want to do:

	echo 'p' > /proc/sysrq-trigger

At least I think it is p that causes a kernel-panic.

That will ensure you are exercising the kexec on panic code path.  That
performs the minimal shutdown in the kernel.

>> That way you can get the orderly shutdown
>> of userspace daemons/services along with an orderly shutdown of
>> everything the kernel is responsible for.
>
> Fixing in userspace is an option but people are not happy that the
> kernel can crash like that.

In a kexec on panic scenario the kernel needs to perform that absolute
bare essential shutdown before calling kexec (basically nothing).
During kexec-on-panic nothing can be relied upon because we don't know
what is broken.  If that is what you care about (as suggested by the
unit test) you need to fix the device initialization.

In a normal kexec scenario the whole normal reboot process is expected.
I have no problems with fixing the kernel to handle that scenario,
but in the real world the entire orderly shutdown both, kernel
and userspace should be performed.

>> At the kernel level a kexec reboot and a normal reboot have been
>> deliberately kept as close as possible.  Which is why I say we should
>> fix it in reboot.
>
> You mean fix it in userspace?

No.  I mean in the kernel the orderly shutdown for a kexec reboot and an
ordinary reboot are kept as close to the same as possible.

It should be the case that the only differences between the two is that
in once case system firmware takes over after the orderly shutdown,
and in the other case a new kernel takes over after the orderly shutdown.

Eric

Joel Fernandes Oct. 11, 2023, 3:52 p.m. UTC | #9

On Tue, Oct 10, 2023 at 5:08 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> Joel Fernandes <joel@joelfernandes.org> writes:
[...]
> >> That way you can get the orderly shutdown
> >> of userspace daemons/services along with an orderly shutdown of
> >> everything the kernel is responsible for.
> >
> > Fixing in userspace is an option but people are not happy that the
> > kernel can crash like that.
>
> In a kexec on panic scenario the kernel needs to perform that absolute
> bare essential shutdown before calling kexec (basically nothing).
> During kexec-on-panic nothing can be relied upon because we don't know
> what is broken.  If that is what you care about (as suggested by the
> unit test) you need to fix the device initialization.
>
> In a normal kexec scenario the whole normal reboot process is expected.
> I have no problems with fixing the kernel to handle that scenario,
> but in the real world the entire orderly shutdown both, kernel
> and userspace should be performed.

Sounds good. Since you mentioned you have no problem with fixing
regular reboot in the kernel, we will work on reproducing the issue
with regular reboot as well and fix that.

I think a syscall causing the kernel to crash instead of operate
normally is a cause of concern, so let us fix the kernel as well
(other than improving the test case as you mentioned).

> >> At the kernel level a kexec reboot and a normal reboot have been
> >> deliberately kept as close as possible.  Which is why I say we should
> >> fix it in reboot.
> >
> > You mean fix it in userspace?
>
> No.  I mean in the kernel the orderly shutdown for a kexec reboot and an
> ordinary reboot are kept as close to the same as possible.
>
> It should be the case that the only differences between the two is that
> in once case system firmware takes over after the orderly shutdown,
> and in the other case a new kernel takes over after the orderly shutdown.

Agreed.

thanks,

 - Joel

diff mbox series

Patch

diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index e2f2574d8b74..6599f485e42d 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -1299,6 +1299,12 @@  int kernel_kexec(void)
 	} else
 #endif
 	{
+		error = freeze_processes();
+		if (error) {
+			error = -EBUSY;
+			goto Unlock;
+		}
+
 		kexec_in_progress = true;
 		kernel_restart_prepare("kexec reboot");
 		migrate_to_reboot_cpu();