Message ID | 20230929021213.2364883-1-joel@joelfernandes.org |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:cae8:0:b0:403:3b70:6f57 with SMTP id r8csp3924283vqu; Fri, 29 Sep 2023 03:34:28 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGku2XAgQBLXc0qClXn85x+DZbjvIlY6rS8uSm03JemQwTR79CHfWR5BTAlhCwJV5WA3NPm X-Received: by 2002:a05:6a20:3d85:b0:133:f0b9:856d with SMTP id s5-20020a056a203d8500b00133f0b9856dmr4467288pzi.17.1695983668061; Fri, 29 Sep 2023 03:34:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695983668; cv=none; d=google.com; s=arc-20160816; b=tb4x9R8AL6AE3hDUNzZVJjHfFiz0uJKBktdCAh+ZXWWa6Dva1Li2zPbgl7m1TuMH9B XtIIjEelu3HGkPn+FccMhDMmrZ6T2URFtgB3CM7Eewwm6k+3B99q/99uikmG87rk8niS SFSmdV2wobQZrMMu2Bi8C2LFVG6Q8wwUpaOBUATx48JDtmz2Nw/bDWS/yqcksWDnN1QP fhUe/TWFOW3tsuq4b+bL0A9iH35YGqINpY8gYxaYPQRe3ydO6LMTsIpPjDiRIy+YT/Ou E7uQbScAc5dlk86KIzaaxN/ywzEBRuDHFiRnDhz4ysND//9Bhh2TzoYtfLsTKnO2Y/Ne m3cg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=Aqq4K4J4v3GU1zpra4AFU6+X76RkkP22ZNAXzHknflA=; fh=wdhAe/PP65B3vs4fw9uQ6ZGxG+a80ZvqOMFezYxWEgg=; b=g4rPhAMi+GFyqI4Jf0hwM2ERrWEIM9smjghrc4IUt6JNEPuqj5YROGda22qf/PU1yt 3bSIjRIjtqvOhn0co2yCDUjDRJhrqZa5+Z+LdRifZIRvND2oiNwsb5rTh5lZEu+7PMog MztVLC3lfqci3D7bAItj9nG/G5GsImAXvgZp/itUcRl2kq49yOshyaafc3c/clniwBWz EmhgH1LJ5wvK2y9Ha7on7Wj/CCTzfUahcNkpM5+i4TsKETC2zXVyL8kZv7tFOWrN9PGw qdniliSnPvKzf6ryisBLPa+92nbsY+MZ02WY9as9WHUSo1Y6onKZZHszr21rGaNUE/RZ QjTw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=N63ntl2J; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from lipwig.vger.email (lipwig.vger.email. [23.128.96.33]) by mx.google.com with ESMTPS id bt24-20020a17090af01800b002770c306d55si1283585pjb.87.2023.09.29.03.34.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 29 Sep 2023 03:34:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) client-ip=23.128.96.33; Authentication-Results: mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=N63ntl2J; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 7069A8380FE8; Thu, 28 Sep 2023 19:12:40 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229912AbjI2CMV (ORCPT <rfc822;pwkd43@gmail.com> + 21 others); Thu, 28 Sep 2023 22:12:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33692 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229654AbjI2CMU (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 28 Sep 2023 22:12:20 -0400 Received: from mail-io1-xd2c.google.com (mail-io1-xd2c.google.com [IPv6:2607:f8b0:4864:20::d2c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DF51A19F for <linux-kernel@vger.kernel.org>; Thu, 28 Sep 2023 19:12:18 -0700 (PDT) Received: by mail-io1-xd2c.google.com with SMTP id ca18e2360f4ac-79f96f83270so437473739f.1 for <linux-kernel@vger.kernel.org>; Thu, 28 Sep 2023 19:12:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; t=1695953538; x=1696558338; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=Aqq4K4J4v3GU1zpra4AFU6+X76RkkP22ZNAXzHknflA=; b=N63ntl2JnuOvbniZltySw41zVfW6TYu0zK/8h9YxrKGH/+KM0WH1rfBuWQrTmWG6dn mVVH7OKN4+AqlRvdSP14TgYSpF2pkOWiDOggMBVYiMXhJUcpT5nOT5wBbv5JG/ydnhbi LpYw3j1pObAa31eigGv38hDuTbKxJa0GREC5c= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695953538; x=1696558338; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Aqq4K4J4v3GU1zpra4AFU6+X76RkkP22ZNAXzHknflA=; b=ZKhyqr5NQYLrpo2S6q+EevS8I49j1y/Njjf03S5C4yQf4SgN0CZgaI6y5Gs0ljz/u4 Tj3v8TuEGoGtTHwG+NVmRFrlNAwdKI27iPkJqxhQxpbP+IiyYt4GEINRP+evbom7drSM cj7VomS207xK35bvnywwmqQLt0e0lr8BK+8n50k/T4nfOKSbunG5qGWaiQ2M4a9C6+A1 b09oE/j8nm+p+9JTOcEzxXCMwf0ZWSy+8cbL5X7RWifWMdYHshCpMbQn/4z2QG2mXsvZ jeMlRGjqV1YbSla0m6s7Qguffq1knKpKc0mT3VtZA1hVCJUQH+WBje9RA4UZ38NFuUd4 v1kQ== X-Gm-Message-State: AOJu0Yy+3kS4hgtVm9sbcOv55vhjX20frl8zTUdX3X9xXMfdubyUIl6r ZSttYGuGEhtpWOyZM6E52IdFVWe8I1cZACxPGQs= X-Received: by 2002:a05:6e02:152e:b0:351:5acb:271 with SMTP id i14-20020a056e02152e00b003515acb0271mr3427556ilu.1.1695953537852; Thu, 28 Sep 2023 19:12:17 -0700 (PDT) Received: from joelboxx5.c.googlers.com.com (161.74.123.34.bc.googleusercontent.com. [34.123.74.161]) by smtp.gmail.com with ESMTPSA id g6-20020a02c546000000b004290f6c15bfsm4937018jaj.145.2023.09.28.19.12.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Sep 2023 19:12:16 -0700 (PDT) From: "Joel Fernandes (Google)" <joel@joelfernandes.org> To: linux-kernel@vger.kernel.org, Eric Biederman <ebiederm@xmission.com> Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>, Steven Rostedt <rostedt@goodmis.org>, Ricardo Ribalda <ribalda@google.com>, Ross Zwisler <zwisler@google.com>, Rob Clark <robdclark@gmail.com>, Linus Torvalds <torvalds@linux-foundation.org>, kexec@lists.infradead.org Subject: [PATCH] kexec: Fix reboot race during device_shutdown() Date: Fri, 29 Sep 2023 02:12:12 +0000 Message-ID: <20230929021213.2364883-1-joel@joelfernandes.org> X-Mailer: git-send-email 2.42.0.582.g8ccd20d70d-goog MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=0.8 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLACK autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Thu, 28 Sep 2023 19:12:40 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1778367770797058160 X-GMAIL-MSGID: 1778367770797058160 |
Series |
kexec: Fix reboot race during device_shutdown()
|
|
Commit Message
Joel Fernandes
Sept. 29, 2023, 2:12 a.m. UTC
During kexec reboot, it is possible for a race to occur between device_shutdown() and userspace. This causes accesses to GPU after pm_runtime suspend has already happened. Fix this by calling freeze_processes() before device_shutdown(). Such freezing is already being done if kernel supports KEXEC_JUMP and kexec_image->preserve_context is true. However, doing it if either of these are not true prevents crashes/races. This fixes the following crash during kexec reboot on an ARM64 device with adreno GPU: [ 292.534314] Kernel panic - not syncing: Asynchronous SError Interrupt [ 292.534323] Hardware name: Google Lazor (rev3 - 8) with LTE (DT) [ 292.534326] Call trace: [ 292.534328] dump_backtrace+0x0/0x1d4 [ 292.534337] show_stack+0x20/0x2c [ 292.534342] dump_stack_lvl+0x60/0x78 [ 292.534347] dump_stack+0x18/0x38 [ 292.534352] panic+0x148/0x3b0 [ 292.534357] nmi_panic+0x80/0x94 [ 292.534364] arm64_serror_panic+0x70/0x7c [ 292.534369] do_serror+0x0/0x7c [ 292.534372] do_serror+0x54/0x7c [ 292.534377] el1h_64_error_handler+0x34/0x4c [ 292.534381] el1h_64_error+0x7c/0x80 [ 292.534386] el1_interrupt+0x20/0x58 [ 292.534389] el1h_64_irq_handler+0x18/0x24 [ 292.534395] el1h_64_irq+0x7c/0x80 [ 292.534399] local_daif_inherit+0x10/0x18 [ 292.534405] el1h_64_sync_handler+0x48/0xb4 [ 292.534410] el1h_64_sync+0x7c/0x80 [ 292.534414] a6xx_gmu_set_oob+0xbc/0x1fc [ 292.534422] a6xx_get_timestamp+0x40/0xb4 [ 292.534426] adreno_get_param+0x12c/0x1e0 [ 292.534433] msm_ioctl_get_param+0x64/0x70 [ 292.534440] drm_ioctl_kernel+0xe8/0x158 [ 292.534448] drm_ioctl+0x208/0x320 [ 292.534453] __arm64_sys_ioctl+0x98/0xd0 [ 292.534461] invoke_syscall+0x4c/0x118 Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ricardo Ribalda <ribalda@google.com> Cc: Ross Zwisler <zwisler@google.com> Cc: Rob Clark <robdclark@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Tested-by: Ricardo Ribalda <ribalda@google.com> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> --- kernel/kexec_core.c | 6 ++++++ 1 file changed, 6 insertions(+)
Comments
"Joel Fernandes (Google)" <joel@joelfernandes.org> writes: > During kexec reboot, it is possible for a race to occur between > device_shutdown() and userspace. This causes accesses to GPU after pm_runtime > suspend has already happened. Fix this by calling freeze_processes() before > device_shutdown(). Is there any reason why this same race with between sys_kexec and the adreno_ioctl can not happen during a normal reboot? Is there any reason why there is not a .shutdown method to prevent the race? I would think the thing to do is to prevent this race in kernel_restart_prepare or in the GPUs .shutdown method. As I don't see anything that would prevent this during a normal reboot. > > Such freezing is already being done if kernel supports KEXEC_JUMP and > kexec_image->preserve_context is true. However, doing it if either of these are > not true prevents crashes/races. The KEXEC_JUMP case is something else entirely. It is supposed to work like suspend to RAM. Maybe reboot should as well, but I am uncomfortable making a generic device fix kexec specific. > This fixes the following crash during kexec reboot on an ARM64 device > with adreno GPU: > > [ 292.534314] Kernel panic - not syncing: Asynchronous SError Interrupt > [ 292.534323] Hardware name: Google Lazor (rev3 - 8) with LTE (DT) > [ 292.534326] Call trace: > [ 292.534328] dump_backtrace+0x0/0x1d4 > [ 292.534337] show_stack+0x20/0x2c > [ 292.534342] dump_stack_lvl+0x60/0x78 > [ 292.534347] dump_stack+0x18/0x38 > [ 292.534352] panic+0x148/0x3b0 > [ 292.534357] nmi_panic+0x80/0x94 > [ 292.534364] arm64_serror_panic+0x70/0x7c > [ 292.534369] do_serror+0x0/0x7c > [ 292.534372] do_serror+0x54/0x7c > [ 292.534377] el1h_64_error_handler+0x34/0x4c > [ 292.534381] el1h_64_error+0x7c/0x80 > [ 292.534386] el1_interrupt+0x20/0x58 > [ 292.534389] el1h_64_irq_handler+0x18/0x24 > [ 292.534395] el1h_64_irq+0x7c/0x80 > [ 292.534399] local_daif_inherit+0x10/0x18 > [ 292.534405] el1h_64_sync_handler+0x48/0xb4 > [ 292.534410] el1h_64_sync+0x7c/0x80 > [ 292.534414] a6xx_gmu_set_oob+0xbc/0x1fc > [ 292.534422] a6xx_get_timestamp+0x40/0xb4 > [ 292.534426] adreno_get_param+0x12c/0x1e0 > [ 292.534433] msm_ioctl_get_param+0x64/0x70 > [ 292.534440] drm_ioctl_kernel+0xe8/0x158 > [ 292.534448] drm_ioctl+0x208/0x320 > [ 292.534453] __arm64_sys_ioctl+0x98/0xd0 > [ 292.534461] invoke_syscall+0x4c/0x118 > > Cc: Steven Rostedt <rostedt@goodmis.org> > Cc: Ricardo Ribalda <ribalda@google.com> > Cc: Ross Zwisler <zwisler@google.com> > Cc: Rob Clark <robdclark@gmail.com> > Cc: Linus Torvalds <torvalds@linux-foundation.org> > Tested-by: Ricardo Ribalda <ribalda@google.com> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > --- > kernel/kexec_core.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c > index e2f2574d8b74..6599f485e42d 100644 > --- a/kernel/kexec_core.c > +++ b/kernel/kexec_core.c > @@ -1299,6 +1299,12 @@ int kernel_kexec(void) > } else > #endif > { > + error = freeze_processes(); > + if (error) { > + error = -EBUSY; > + goto Unlock; > + } > + > kexec_in_progress = true; > kernel_restart_prepare("kexec reboot"); > migrate_to_reboot_cpu(); Eric
Hi Eric, On Fri, Sep 29, 2023 at 12:01 PM Eric W. Biederman <ebiederm@xmission.com> wrote: > > "Joel Fernandes (Google)" <joel@joelfernandes.org> writes: > > > During kexec reboot, it is possible for a race to occur between > > device_shutdown() and userspace. This causes accesses to GPU after pm_runtime > > suspend has already happened. Fix this by calling freeze_processes() before > > device_shutdown(). > > Is there any reason why this same race with between sys_kexec and the > adreno_ioctl can not happen during a normal reboot? Thanks for the response. It can happen during a normal reboot. I think the reason it does not show up in the wild is because the "reboot" command implementation typically sends one of SIGSTOP or SIGKILL to all processes which effectively prevents the race. In any case, there is also a school of thought that says the kernel should be resilient to crashes and a userspace workaround involving sending signals could be looked at as papering over the real issue. I do sympathize/agree with that school of thought as well. > Is there any reason why there is not a .shutdown method to prevent the > race? > I would think the thing to do is to prevent this race in > kernel_restart_prepare or in the GPUs .shutdown method. As I don't see > anything that would prevent this during a normal reboot. What you're saying is essentially what I remember trying, the issue is not in the GPU driver but rather there the interconnect in the SoC shutdown and causes an "SError" exception if the CPU tries to access the memory locations, as also seen in the stack. I was not able to trace exactly when the interconnect becomes unavailable and perhaps there is a possibility of a more intricate fix where we can signal the GPU driver to not access the bus anymore, but my suspicion is that will add a lot of complexity and perhaps leave the door open to similar issues. > > Such freezing is already being done if kernel supports KEXEC_JUMP and > > kexec_image->preserve_context is true. However, doing it if either of these are > > not true prevents crashes/races. > > The KEXEC_JUMP case is something else entirely. It is supposed to work > like suspend to RAM. Maybe reboot should as well, but I am > uncomfortable making a generic device fix kexec specific. I see your point of view. I think regular reboot should also be fixed to avoid similar crash possibilities. I am happy to make a change for that similar to this patch if we want to proceed that way. Thoughts? thanks, - Joel
On Mon, Oct 2, 2023 at 2:18 PM Joel Fernandes <joel@joelfernandes.org> wrote: [..] > > > Such freezing is already being done if kernel supports KEXEC_JUMP and > > > kexec_image->preserve_context is true. However, doing it if either of these are > > > not true prevents crashes/races. > > > > The KEXEC_JUMP case is something else entirely. It is supposed to work > > like suspend to RAM. Maybe reboot should as well, but I am > > uncomfortable making a generic device fix kexec specific. > > I see your point of view. I think regular reboot should also be fixed > to avoid similar crash possibilities. I am happy to make a change for > that similar to this patch if we want to proceed that way. > > Thoughts? Just checking how we want to proceed, is the consensus that we should prevent kernel crashes without relying on userspace stopping all processes? Should we fix regular reboot syscall as well and not just kexec reboot? thanks, - Joel
On Sat, 7 Oct 2023 21:30:42 -0400 Joel Fernandes <joel@joelfernandes.org> wrote: > Just checking how we want to proceed, is the consensus that we should > prevent kernel crashes without relying on userspace stopping all > processes? Should we fix regular reboot syscall as well and not just > kexec reboot? If you can show that we can trigger the crash on normal reboot, then I don't see why not. That is, if you have a program that does the reboot (without the SIGSTOP/SIGKILL calls) and triggers this crash, I think that's a legitimate reason to fix it on normal reboot too. -- Steve
Joel Fernandes <joel@joelfernandes.org> writes: > On Mon, Oct 2, 2023 at 2:18 PM Joel Fernandes <joel@joelfernandes.org> wrote: > [..] >> > > Such freezing is already being done if kernel supports KEXEC_JUMP and >> > > kexec_image->preserve_context is true. However, doing it if either of these are >> > > not true prevents crashes/races. >> > >> > The KEXEC_JUMP case is something else entirely. It is supposed to work >> > like suspend to RAM. Maybe reboot should as well, but I am >> > uncomfortable making a generic device fix kexec specific. >> >> I see your point of view. I think regular reboot should also be fixed >> to avoid similar crash possibilities. I am happy to make a change for >> that similar to this patch if we want to proceed that way. >> >> Thoughts? > > Just checking how we want to proceed, is the consensus that we should > prevent kernel crashes without relying on userspace stopping all > processes? Should we fix regular reboot syscall as well and not just > kexec reboot? It just occurred to me there is something very fishy about all of this. What userspace do you have using kexec (not kexec on panic) that doesn't preform the same userspace shutdown as a normal reboot? Quite frankly such a userspace is buggy, and arguably that is where you should start fixing things. That way you can get the orderly shutdown of userspace daemons/services along with an orderly shutdown of everything the kernel is responsible for. At the kernel level a kexec reboot and a normal reboot have been deliberately kept as close as possible. Which is why I say we should fix it in reboot. Eric
On Mon, Oct 9, 2023 at 11:30 AM Eric W. Biederman <ebiederm@xmission.com> wrote: > > Joel Fernandes <joel@joelfernandes.org> writes: > > > On Mon, Oct 2, 2023 at 2:18 PM Joel Fernandes <joel@joelfernandes.org> wrote: > > [..] > >> > > Such freezing is already being done if kernel supports KEXEC_JUMP and > >> > > kexec_image->preserve_context is true. However, doing it if either of these are > >> > > not true prevents crashes/races. > >> > > >> > The KEXEC_JUMP case is something else entirely. It is supposed to work > >> > like suspend to RAM. Maybe reboot should as well, but I am > >> > uncomfortable making a generic device fix kexec specific. > >> > >> I see your point of view. I think regular reboot should also be fixed > >> to avoid similar crash possibilities. I am happy to make a change for > >> that similar to this patch if we want to proceed that way. > >> > >> Thoughts? > > > > Just checking how we want to proceed, is the consensus that we should > > prevent kernel crashes without relying on userspace stopping all > > processes? Should we fix regular reboot syscall as well and not just > > kexec reboot? > > It just occurred to me there is something very fishy about all of this. > > What userspace do you have using kexec (not kexec on panic) that doesn't > preform the same userspace shutdown as a normal reboot? > > Quite frankly such a userspace is buggy, and arguably that is where you > should start fixing things. It is a simple unit test that tests kexec support by kexec-rebooting the kernel. I don't think SIGSTOP/SIGKILL'ing during kexec-reboot is ideal because in a real panic-on-kexec type crash, that may not happen and so does not emulate the real world that well. I think we want the kexec-reboot to do a *reboot* without crashing the kernel while doing so. Ricardo/Steve can chime on what they feel as well. > That way you can get the orderly shutdown > of userspace daemons/services along with an orderly shutdown of > everything the kernel is responsible for. Fixing in userspace is an option but people are not happy that the kernel can crash like that. > At the kernel level a kexec reboot and a normal reboot have been > deliberately kept as close as possible. Which is why I say we should > fix it in reboot. You mean fix it in userspace? thanks, - Joel
On Mon, Oct 9, 2023 at 10:00 AM Steven Rostedt <rostedt@goodmis.org> wrote: > > On Sat, 7 Oct 2023 21:30:42 -0400 > Joel Fernandes <joel@joelfernandes.org> wrote: > > > Just checking how we want to proceed, is the consensus that we should > > prevent kernel crashes without relying on userspace stopping all > > processes? Should we fix regular reboot syscall as well and not just > > kexec reboot? > > If you can show that we can trigger the crash on normal reboot, then I > don't see why not. That is, if you have a program that does the reboot > (without the SIGSTOP/SIGKILL calls) and triggers this crash, I think that's > a legitimate reason to fix it on normal reboot too. Ok, Sounds good, thanks for sharing your thoughts. - Joel
Joel Fernandes <joel@joelfernandes.org> writes: > On Mon, Oct 9, 2023 at 11:30 AM Eric W. Biederman <ebiederm@xmission.com> wrote: >> >> Joel Fernandes <joel@joelfernandes.org> writes: >> >> > On Mon, Oct 2, 2023 at 2:18 PM Joel Fernandes <joel@joelfernandes.org> wrote: >> > [..] >> >> > > Such freezing is already being done if kernel supports KEXEC_JUMP and >> >> > > kexec_image->preserve_context is true. However, doing it if either of these are >> >> > > not true prevents crashes/races. >> >> > >> >> > The KEXEC_JUMP case is something else entirely. It is supposed to work >> >> > like suspend to RAM. Maybe reboot should as well, but I am >> >> > uncomfortable making a generic device fix kexec specific. >> >> >> >> I see your point of view. I think regular reboot should also be fixed >> >> to avoid similar crash possibilities. I am happy to make a change for >> >> that similar to this patch if we want to proceed that way. >> >> >> >> Thoughts? >> > >> > Just checking how we want to proceed, is the consensus that we should >> > prevent kernel crashes without relying on userspace stopping all >> > processes? Should we fix regular reboot syscall as well and not just >> > kexec reboot? >> >> It just occurred to me there is something very fishy about all of this. >> >> What userspace do you have using kexec (not kexec on panic) that doesn't >> preform the same userspace shutdown as a normal reboot? >> >> Quite frankly such a userspace is buggy, and arguably that is where you >> should start fixing things. > > It is a simple unit test that tests kexec support by kexec-rebooting > the kernel. I don't think SIGSTOP/SIGKILL'ing during kexec-reboot is > ideal because in a real panic-on-kexec type crash, that may not happen > and so does not emulate the real world that well. I think we want the > kexec-reboot to do a *reboot* without crashing the kernel while doing > so. Ricardo/Steve can chime on what they feel as well. This is confusing. You have a unit test that, that tests kexec on panic using a the full kexec reboot. The two are fundamentally similar but you aren't going to have a valid test case if you mix them. There is a whole kernel module that tests more interesting cases, for the simple case you probably just want to do: echo 'p' > /proc/sysrq-trigger At least I think it is p that causes a kernel-panic. That will ensure you are exercising the kexec on panic code path. That performs the minimal shutdown in the kernel. >> That way you can get the orderly shutdown >> of userspace daemons/services along with an orderly shutdown of >> everything the kernel is responsible for. > > Fixing in userspace is an option but people are not happy that the > kernel can crash like that. In a kexec on panic scenario the kernel needs to perform that absolute bare essential shutdown before calling kexec (basically nothing). During kexec-on-panic nothing can be relied upon because we don't know what is broken. If that is what you care about (as suggested by the unit test) you need to fix the device initialization. In a normal kexec scenario the whole normal reboot process is expected. I have no problems with fixing the kernel to handle that scenario, but in the real world the entire orderly shutdown both, kernel and userspace should be performed. >> At the kernel level a kexec reboot and a normal reboot have been >> deliberately kept as close as possible. Which is why I say we should >> fix it in reboot. > > You mean fix it in userspace? No. I mean in the kernel the orderly shutdown for a kexec reboot and an ordinary reboot are kept as close to the same as possible. It should be the case that the only differences between the two is that in once case system firmware takes over after the orderly shutdown, and in the other case a new kernel takes over after the orderly shutdown. Eric
On Tue, Oct 10, 2023 at 5:08 PM Eric W. Biederman <ebiederm@xmission.com> wrote: > > Joel Fernandes <joel@joelfernandes.org> writes: [...] > >> That way you can get the orderly shutdown > >> of userspace daemons/services along with an orderly shutdown of > >> everything the kernel is responsible for. > > > > Fixing in userspace is an option but people are not happy that the > > kernel can crash like that. > > In a kexec on panic scenario the kernel needs to perform that absolute > bare essential shutdown before calling kexec (basically nothing). > During kexec-on-panic nothing can be relied upon because we don't know > what is broken. If that is what you care about (as suggested by the > unit test) you need to fix the device initialization. > > In a normal kexec scenario the whole normal reboot process is expected. > I have no problems with fixing the kernel to handle that scenario, > but in the real world the entire orderly shutdown both, kernel > and userspace should be performed. Sounds good. Since you mentioned you have no problem with fixing regular reboot in the kernel, we will work on reproducing the issue with regular reboot as well and fix that. I think a syscall causing the kernel to crash instead of operate normally is a cause of concern, so let us fix the kernel as well (other than improving the test case as you mentioned). > >> At the kernel level a kexec reboot and a normal reboot have been > >> deliberately kept as close as possible. Which is why I say we should > >> fix it in reboot. > > > > You mean fix it in userspace? > > No. I mean in the kernel the orderly shutdown for a kexec reboot and an > ordinary reboot are kept as close to the same as possible. > > It should be the case that the only differences between the two is that > in once case system firmware takes over after the orderly shutdown, > and in the other case a new kernel takes over after the orderly shutdown. Agreed. thanks, - Joel
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index e2f2574d8b74..6599f485e42d 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -1299,6 +1299,12 @@ int kernel_kexec(void) } else #endif { + error = freeze_processes(); + if (error) { + error = -EBUSY; + goto Unlock; + } + kexec_in_progress = true; kernel_restart_prepare("kexec reboot"); migrate_to_reboot_cpu();