Message ID | 20230731091754.1.I501ab68cb926ee33a7c87e063d207abf09b9943c@changeid |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:918b:0:b0:3e4:2afc:c1 with SMTP id s11csp2249089vqg; Mon, 31 Jul 2023 13:28:28 -0700 (PDT) X-Google-Smtp-Source: APBJJlEAmX+VCxSaoX0XLL/kOIOd80rAJNllNzoj1T5yyZ4AiShKHM7GCBwX1wlUrr8LYh2LtTKJ X-Received: by 2002:a17:907:75d1:b0:99a:7ff1:9b5a with SMTP id jl17-20020a17090775d100b0099a7ff19b5amr705171ejc.4.1690835307911; Mon, 31 Jul 2023 13:28:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690835307; cv=none; d=google.com; s=arc-20160816; b=ewCYIiLN0yaUoU8eFPfUreGpGqsBAO1GOnnBY2WyYyqr1qwLMP96yx5nXqziR3K89d zdbz4uhkSC9W66EMqGFXXxaKIIXyKoR/juZiFdtQKDr+RUCuAlGp85mdd4ZAV3e+DJFa 4sUS6CSyMtINsHT1xqdMu2KvN0gXzV2PeclGjWAamwY2/WYwMAg4OLvsnMq2iVEQsCyf p0D54oIbADJ5FszAoBZVg7JvhUcsTDGsCW56Mt4yJchZv7KFklSYiugS8FSac6SjSfaG E+bYXOLo842eCIsbL9bM91oUBoHq4ZZKWpAVxlurrpwCokEOaeDJU9MWeE3qFWOJF+J3 EPlA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=GhiRHD7MGLBXP1VVMZrWj/GEp3TvN2QnfmjjsFZrIfA=; fh=6kqS5PEoLGvNgRjrPf2yUcWRrdecI9V/0JyXG4DVChM=; b=z7aa6uUSQ2eWLIgTeVaqeIMpsBeE9cxbvsraEgQeP8jmN9fbQvyw1aEhIofM+KXSXy Hp/5wD2NZfdMbvck7XQ26H/vu1i6O3IwsRFFQLnWJ2yyIOTStMTeDi7peA+xlZ2EjH10 7GY+U679kEPU7q5PdPnOnuPaSoORvv6QmNZgwjHACe9ZvGOePmlmasN1YChd8pv4fRX3 GgffT26BlsAC9AkffIVpGavY+k6LOfmtsY5ItTiuoyg8kmm5ZEbNsB+kqHHBkXjhapBT QSzfSZbwTYAYPbiz/0tcnAl5qa/jDuXirurmUmG+QwUqEZE9UPU6XgzX2bQQAaacRaQj unHQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=QroKgH8F; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u27-20020a1709063b9b00b00965ec09592bsi7183050ejf.817.2023.07.31.13.28.02; Mon, 31 Jul 2023 13:28:27 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=QroKgH8F; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231812AbjGaQSy (ORCPT <rfc822;dengxinlin2429@gmail.com> + 99 others); Mon, 31 Jul 2023 12:18:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51992 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229717AbjGaQSo (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 31 Jul 2023 12:18:44 -0400 Received: from mail-pf1-x42a.google.com (mail-pf1-x42a.google.com [IPv6:2607:f8b0:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EEB2CA7 for <linux-kernel@vger.kernel.org>; Mon, 31 Jul 2023 09:18:43 -0700 (PDT) Received: by mail-pf1-x42a.google.com with SMTP id d2e1a72fcca58-686fc0d3c92so2854017b3a.0 for <linux-kernel@vger.kernel.org>; Mon, 31 Jul 2023 09:18:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1690820323; x=1691425123; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=GhiRHD7MGLBXP1VVMZrWj/GEp3TvN2QnfmjjsFZrIfA=; b=QroKgH8Fk4/HPx94jc6GG+5dpU56WuWL6YBlRv9nVlWvNBo28U6FVp44VS8MyVuwhO rwN55md7D6yc2NTkNCB3o5PWoPw6ChqF8UYDTTOKwAcUglEQl4GcARsVM77asIGGq0iN a5u3NkcdThwEhCKf8lZpyP4myIHD2TuYXAqMc= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690820323; x=1691425123; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=GhiRHD7MGLBXP1VVMZrWj/GEp3TvN2QnfmjjsFZrIfA=; b=lc3BZ4uNiLnALtvpX8CTJPv6CdRxMElCAigwUIQSMYZqGmNkfgdC/25+eEPRs8BTCX gEDEo0Mu/ZXYTMaSyXyikT1wLC/VzN3wLhaGmohyCsTeTNWGZjYyywSLpPeU2XkAuHAL 1POmafJ27x3+KEfrdaYpmOCGiduBnYWX+SgSebgm57CqUOkBdABsREDQM+VfwXTjsS6y o812KM4iVZ8ghQRZlRC9Psra0Ro52sJsBQowDi6tl0GmMc+FNpWY4dXRV5kz4GkftatB WTVWTz3etYysGcUBClxcA15+n+M9p2123PJvk+o3I9Cq1ajnNtG/5P+Yl8qMmk2d55iZ a7Dw== X-Gm-Message-State: ABy/qLb5IEaELQKoGmIuJjBi29l0TOzUio+n4lQIPKTftf/eghYsGdrI ooA5ROYmdSUfNBMyMdP7SfP3pNbqlSGsbEEqIh0= X-Received: by 2002:a05:6a00:8d6:b0:668:711a:7d93 with SMTP id s22-20020a056a0008d600b00668711a7d93mr11051327pfu.19.1690820323375; Mon, 31 Jul 2023 09:18:43 -0700 (PDT) Received: from tictac2.mtv.corp.google.com ([2620:15c:9d:2:7a1d:1020:ba0a:6394]) by smtp.gmail.com with ESMTPSA id q20-20020a62ae14000000b00672ea40b8a9sm7987724pff.170.2023.07.31.09.18.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 31 Jul 2023 09:18:42 -0700 (PDT) From: Douglas Anderson <dianders@chromium.org> To: Andrew Morton <akpm@linux-foundation.org> Cc: Petr Mladek <pmladek@suse.com>, Douglas Anderson <dianders@chromium.org>, kernel test robot <lkp@intel.com>, Lecopzer Chen <lecopzer.chen@mediatek.com>, Pingfan Liu <kernelfans@gmail.com>, linux-kernel@vger.kernel.org Subject: [PATCH] watchdog/hardlockup: Avoid large stack frames in watchdog_hardlockup_check() Date: Mon, 31 Jul 2023 09:17:59 -0700 Message-ID: <20230731091754.1.I501ab68cb926ee33a7c87e063d207abf09b9943c@changeid> X-Mailer: git-send-email 2.41.0.487.g6d72f3e995-goog MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1772969323780132009 X-GMAIL-MSGID: 1772969323780132009 |
Series |
watchdog/hardlockup: Avoid large stack frames in watchdog_hardlockup_check()
|
|
Commit Message
Doug Anderson
July 31, 2023, 4:17 p.m. UTC
After commit 77c12fc95980 ("watchdog/hardlockup: add a "cpu" param to
watchdog_hardlockup_check()") we started storing a `struct cpumask` on
the stack in watchdog_hardlockup_check(). On systems with
CONFIG_NR_CPUS set to 8192 this takes up 1K on the stack. That
triggers warnings with `CONFIG_FRAME_WARN` set to 1024.
Instead of putting this `struct cpumask` on the stack, let's declare
it as `static`. This has the downside of taking up 1K of memory all
the time on systems with `CONFIG_NR_CPUS` to 8192, but on systems with
smaller `CONFIG_NR_CPUS` it's not much emory (with 128 CPUs it's only
16 bytes of memory). Presumably anyone building a system with
`CONFIG_NR_CPUS=8192` can afford the extra 1K of memory.
NOTE: as part of this change, we no longer check the return value of
trigger_single_cpu_backtrace(). While we could do this and only call
cpumask_clear_cpu() if trigger_single_cpu_backtrace() didn't fail,
that's probably not worth it. There's no reason to believe that
trigger_cpumask_backtrace() will succeed at backtracing the CPU when
trigger_single_cpu_backtrace() failed.
Alternatives considered:
- Use kmalloc with GFP_ATOMIC to allocate. I decided against this
since relying on kmalloc when the system is hard locked up seems
like a bad idea.
- Change the arch_trigger_cpumask_backtrace() across all architectures
to take an extra parameter to get the needed behavior. This seems
like a lot of churn for a small savings.
Fixes: 77c12fc95980 ("watchdog/hardlockup: add a "cpu" param to watchdog_hardlockup_check()")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/r/202307310955.pLZDhpnl-lkp@intel.com
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---
kernel/watchdog.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
Comments
On Mon 31-07-23 09:17:59, Douglas Anderson wrote: > After commit 77c12fc95980 ("watchdog/hardlockup: add a "cpu" param to > watchdog_hardlockup_check()") we started storing a `struct cpumask` on > the stack in watchdog_hardlockup_check(). On systems with > CONFIG_NR_CPUS set to 8192 this takes up 1K on the stack. That > triggers warnings with `CONFIG_FRAME_WARN` set to 1024. > > Instead of putting this `struct cpumask` on the stack, let's declare > it as `static`. This has the downside of taking up 1K of memory all > the time on systems with `CONFIG_NR_CPUS` to 8192, but on systems with > smaller `CONFIG_NR_CPUS` it's not much emory (with 128 CPUs it's only > 16 bytes of memory). Presumably anyone building a system with > `CONFIG_NR_CPUS=8192` can afford the extra 1K of memory. > > NOTE: as part of this change, we no longer check the return value of > trigger_single_cpu_backtrace(). While we could do this and only call > cpumask_clear_cpu() if trigger_single_cpu_backtrace() didn't fail, > that's probably not worth it. There's no reason to believe that > trigger_cpumask_backtrace() will succeed at backtracing the CPU when > trigger_single_cpu_backtrace() failed. > > Alternatives considered: > - Use kmalloc with GFP_ATOMIC to allocate. I decided against this > since relying on kmalloc when the system is hard locked up seems > like a bad idea. > - Change the arch_trigger_cpumask_backtrace() across all architectures > to take an extra parameter to get the needed behavior. This seems > like a lot of churn for a small savings. > > Fixes: 77c12fc95980 ("watchdog/hardlockup: add a "cpu" param to watchdog_hardlockup_check()") > Reported-by: kernel test robot <lkp@intel.com> > Closes: https://lore.kernel.org/r/202307310955.pLZDhpnl-lkp@intel.com > Signed-off-by: Douglas Anderson <dianders@chromium.org> > --- > > kernel/watchdog.c | 14 +++++++------- > 1 file changed, 7 insertions(+), 7 deletions(-) > > diff --git a/kernel/watchdog.c b/kernel/watchdog.c > index be38276a365f..19db2357969a 100644 > --- a/kernel/watchdog.c > +++ b/kernel/watchdog.c > @@ -151,9 +151,6 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs) > */ > if (is_hardlockup(cpu)) { > unsigned int this_cpu = smp_processor_id(); > - struct cpumask backtrace_mask; > - > - cpumask_copy(&backtrace_mask, cpu_online_mask); > > /* Only print hardlockups once. */ > if (per_cpu(watchdog_hardlockup_warned, cpu)) > @@ -167,10 +164,8 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs) > show_regs(regs); > else > dump_stack(); > - cpumask_clear_cpu(cpu, &backtrace_mask); > } else { > - if (trigger_single_cpu_backtrace(cpu)) > - cpumask_clear_cpu(cpu, &backtrace_mask); > + trigger_single_cpu_backtrace(cpu); > } > > /* > @@ -178,8 +173,13 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs) > * hardlockups generating interleaving traces > */ > if (sysctl_hardlockup_all_cpu_backtrace && > - !test_and_set_bit(0, &watchdog_hardlockup_all_cpu_dumped)) > + !test_and_set_bit(0, &watchdog_hardlockup_all_cpu_dumped)) { > + static struct cpumask backtrace_mask; > + > + cpumask_copy(&backtrace_mask, cpu_online_mask); > + cpumask_clear_cpu(cpu, &backtrace_mask); > trigger_cpumask_backtrace(&backtrace_mask); This looks rather wasteful to just copy the cpumask over to backtrace_mask in nmi_trigger_cpumask_backtrace (which all but sparc arches do AFAICS). Would it be possible to use arch_trigger_cpumask_backtrace(cpu_online_mask, false) and special case cpu != this_cpu && sysctl_hardlockup_all_cpu_backtrace? > + } > > if (hardlockup_panic) > nmi_panic(regs, "Hard LOCKUP"); > -- > 2.41.0.487.g6d72f3e995-goog >
Hi, On Tue, Aug 1, 2023 at 5:58 AM Michal Hocko <mhocko@suse.com> wrote: > > On Mon 31-07-23 09:17:59, Douglas Anderson wrote: > > After commit 77c12fc95980 ("watchdog/hardlockup: add a "cpu" param to > > watchdog_hardlockup_check()") we started storing a `struct cpumask` on > > the stack in watchdog_hardlockup_check(). On systems with > > CONFIG_NR_CPUS set to 8192 this takes up 1K on the stack. That > > triggers warnings with `CONFIG_FRAME_WARN` set to 1024. > > > > Instead of putting this `struct cpumask` on the stack, let's declare > > it as `static`. This has the downside of taking up 1K of memory all > > the time on systems with `CONFIG_NR_CPUS` to 8192, but on systems with > > smaller `CONFIG_NR_CPUS` it's not much emory (with 128 CPUs it's only > > 16 bytes of memory). Presumably anyone building a system with > > `CONFIG_NR_CPUS=8192` can afford the extra 1K of memory. > > > > NOTE: as part of this change, we no longer check the return value of > > trigger_single_cpu_backtrace(). While we could do this and only call > > cpumask_clear_cpu() if trigger_single_cpu_backtrace() didn't fail, > > that's probably not worth it. There's no reason to believe that > > trigger_cpumask_backtrace() will succeed at backtracing the CPU when > > trigger_single_cpu_backtrace() failed. > > > > Alternatives considered: > > - Use kmalloc with GFP_ATOMIC to allocate. I decided against this > > since relying on kmalloc when the system is hard locked up seems > > like a bad idea. > > - Change the arch_trigger_cpumask_backtrace() across all architectures > > to take an extra parameter to get the needed behavior. This seems > > like a lot of churn for a small savings. > > > > Fixes: 77c12fc95980 ("watchdog/hardlockup: add a "cpu" param to watchdog_hardlockup_check()") > > Reported-by: kernel test robot <lkp@intel.com> > > Closes: https://lore.kernel.org/r/202307310955.pLZDhpnl-lkp@intel.com > > Signed-off-by: Douglas Anderson <dianders@chromium.org> > > --- > > > > kernel/watchdog.c | 14 +++++++------- > > 1 file changed, 7 insertions(+), 7 deletions(-) > > > > diff --git a/kernel/watchdog.c b/kernel/watchdog.c > > index be38276a365f..19db2357969a 100644 > > --- a/kernel/watchdog.c > > +++ b/kernel/watchdog.c > > @@ -151,9 +151,6 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs) > > */ > > if (is_hardlockup(cpu)) { > > unsigned int this_cpu = smp_processor_id(); > > - struct cpumask backtrace_mask; > > - > > - cpumask_copy(&backtrace_mask, cpu_online_mask); > > > > /* Only print hardlockups once. */ > > if (per_cpu(watchdog_hardlockup_warned, cpu)) > > @@ -167,10 +164,8 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs) > > show_regs(regs); > > else > > dump_stack(); > > - cpumask_clear_cpu(cpu, &backtrace_mask); > > } else { > > - if (trigger_single_cpu_backtrace(cpu)) > > - cpumask_clear_cpu(cpu, &backtrace_mask); > > + trigger_single_cpu_backtrace(cpu); > > } > > > > /* > > @@ -178,8 +173,13 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs) > > * hardlockups generating interleaving traces > > */ > > if (sysctl_hardlockup_all_cpu_backtrace && > > - !test_and_set_bit(0, &watchdog_hardlockup_all_cpu_dumped)) > > + !test_and_set_bit(0, &watchdog_hardlockup_all_cpu_dumped)) { > > + static struct cpumask backtrace_mask; > > + > > + cpumask_copy(&backtrace_mask, cpu_online_mask); > > + cpumask_clear_cpu(cpu, &backtrace_mask); > > trigger_cpumask_backtrace(&backtrace_mask); > > This looks rather wasteful to just copy the cpumask over to > backtrace_mask in nmi_trigger_cpumask_backtrace (which all but sparc > arches do AFAICS). > > Would it be possible to use arch_trigger_cpumask_backtrace(cpu_online_mask, false) > and special case cpu != this_cpu && sysctl_hardlockup_all_cpu_backtrace? So you're saying optimize the case where cpu == this_cpu and then have a special case (where we still copy) for cpu != this_cpu? I can do that if that's what people want, but (assuming I understand correctly) that's making the wrong tradeoff. Specifically, this code runs one time right as we're crashing and if it takes an extra millisecond to run it's not a huge deal. It feels better to avoid the special case and keep the code smaller. Let me know if I misunderstood. -Doug
On Tue 01-08-23 07:16:15, Doug Anderson wrote: > Hi, > > On Tue, Aug 1, 2023 at 5:58 AM Michal Hocko <mhocko@suse.com> wrote: > > > > On Mon 31-07-23 09:17:59, Douglas Anderson wrote: > > > After commit 77c12fc95980 ("watchdog/hardlockup: add a "cpu" param to > > > watchdog_hardlockup_check()") we started storing a `struct cpumask` on > > > the stack in watchdog_hardlockup_check(). On systems with > > > CONFIG_NR_CPUS set to 8192 this takes up 1K on the stack. That > > > triggers warnings with `CONFIG_FRAME_WARN` set to 1024. > > > > > > Instead of putting this `struct cpumask` on the stack, let's declare > > > it as `static`. This has the downside of taking up 1K of memory all > > > the time on systems with `CONFIG_NR_CPUS` to 8192, but on systems with > > > smaller `CONFIG_NR_CPUS` it's not much emory (with 128 CPUs it's only > > > 16 bytes of memory). Presumably anyone building a system with > > > `CONFIG_NR_CPUS=8192` can afford the extra 1K of memory. > > > > > > NOTE: as part of this change, we no longer check the return value of > > > trigger_single_cpu_backtrace(). While we could do this and only call > > > cpumask_clear_cpu() if trigger_single_cpu_backtrace() didn't fail, > > > that's probably not worth it. There's no reason to believe that > > > trigger_cpumask_backtrace() will succeed at backtracing the CPU when > > > trigger_single_cpu_backtrace() failed. > > > > > > Alternatives considered: > > > - Use kmalloc with GFP_ATOMIC to allocate. I decided against this > > > since relying on kmalloc when the system is hard locked up seems > > > like a bad idea. > > > - Change the arch_trigger_cpumask_backtrace() across all architectures > > > to take an extra parameter to get the needed behavior. This seems > > > like a lot of churn for a small savings. > > > > > > Fixes: 77c12fc95980 ("watchdog/hardlockup: add a "cpu" param to watchdog_hardlockup_check()") > > > Reported-by: kernel test robot <lkp@intel.com> > > > Closes: https://lore.kernel.org/r/202307310955.pLZDhpnl-lkp@intel.com > > > Signed-off-by: Douglas Anderson <dianders@chromium.org> > > > --- > > > > > > kernel/watchdog.c | 14 +++++++------- > > > 1 file changed, 7 insertions(+), 7 deletions(-) > > > > > > diff --git a/kernel/watchdog.c b/kernel/watchdog.c > > > index be38276a365f..19db2357969a 100644 > > > --- a/kernel/watchdog.c > > > +++ b/kernel/watchdog.c > > > @@ -151,9 +151,6 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs) > > > */ > > > if (is_hardlockup(cpu)) { > > > unsigned int this_cpu = smp_processor_id(); > > > - struct cpumask backtrace_mask; > > > - > > > - cpumask_copy(&backtrace_mask, cpu_online_mask); > > > > > > /* Only print hardlockups once. */ > > > if (per_cpu(watchdog_hardlockup_warned, cpu)) > > > @@ -167,10 +164,8 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs) > > > show_regs(regs); > > > else > > > dump_stack(); > > > - cpumask_clear_cpu(cpu, &backtrace_mask); > > > } else { > > > - if (trigger_single_cpu_backtrace(cpu)) > > > - cpumask_clear_cpu(cpu, &backtrace_mask); > > > + trigger_single_cpu_backtrace(cpu); > > > } > > > > > > /* > > > @@ -178,8 +173,13 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs) > > > * hardlockups generating interleaving traces > > > */ > > > if (sysctl_hardlockup_all_cpu_backtrace && > > > - !test_and_set_bit(0, &watchdog_hardlockup_all_cpu_dumped)) > > > + !test_and_set_bit(0, &watchdog_hardlockup_all_cpu_dumped)) { > > > + static struct cpumask backtrace_mask; > > > + > > > + cpumask_copy(&backtrace_mask, cpu_online_mask); > > > + cpumask_clear_cpu(cpu, &backtrace_mask); > > > trigger_cpumask_backtrace(&backtrace_mask); > > > > This looks rather wasteful to just copy the cpumask over to > > backtrace_mask in nmi_trigger_cpumask_backtrace (which all but sparc > > arches do AFAICS). > > > > Would it be possible to use arch_trigger_cpumask_backtrace(cpu_online_mask, false) > > and special case cpu != this_cpu && sysctl_hardlockup_all_cpu_backtrace? > > So you're saying optimize the case where cpu == this_cpu and then have > a special case (where we still copy) for cpu != this_cpu? I can do > that if that's what people want, but (assuming I understand correctly) > that's making the wrong tradeoff. Specifically, this code runs one > time right as we're crashing and if it takes an extra millisecond to > run it's not a huge deal. It feels better to avoid the special case > and keep the code smaller. > > Let me know if I misunderstood. I meant something like this (untested but it should give an idea what I mean hopefully). Maybe it can be simplified even further. TBH I am not entirely sure why cpu == this_cpu needs this special handling. --- diff --git a/kernel/watchdog.c b/kernel/watchdog.c index be38276a365f..0eedac9f1019 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -151,9 +151,7 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs) */ if (is_hardlockup(cpu)) { unsigned int this_cpu = smp_processor_id(); - struct cpumask backtrace_mask; - - cpumask_copy(&backtrace_mask, cpu_online_mask); + bool dump_all = false; /* Only print hardlockups once. */ if (per_cpu(watchdog_hardlockup_warned, cpu)) @@ -167,10 +165,6 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs) show_regs(regs); else dump_stack(); - cpumask_clear_cpu(cpu, &backtrace_mask); - } else { - if (trigger_single_cpu_backtrace(cpu)) - cpumask_clear_cpu(cpu, &backtrace_mask); } /* @@ -179,7 +173,12 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs) */ if (sysctl_hardlockup_all_cpu_backtrace && !test_and_set_bit(0, &watchdog_hardlockup_all_cpu_dumped)) - trigger_cpumask_backtrace(&backtrace_mask); + dump_all = true; + + if (dump_all) + arch_trigger_cpumask_backtrace(cpu_online_mask, cpu != this_cpu); + else if (cpu != this_cpu) + trigger_single_cpu_backtrace(cpu); if (hardlockup_panic) nmi_panic(regs, "Hard LOCKUP");
Hi, On Tue, Aug 1, 2023 at 8:26 AM Michal Hocko <mhocko@suse.com> wrote: > > On Tue 01-08-23 07:16:15, Doug Anderson wrote: > > Hi, > > > > On Tue, Aug 1, 2023 at 5:58 AM Michal Hocko <mhocko@suse.com> wrote: > > > > > > On Mon 31-07-23 09:17:59, Douglas Anderson wrote: > > > > After commit 77c12fc95980 ("watchdog/hardlockup: add a "cpu" param to > > > > watchdog_hardlockup_check()") we started storing a `struct cpumask` on > > > > the stack in watchdog_hardlockup_check(). On systems with > > > > CONFIG_NR_CPUS set to 8192 this takes up 1K on the stack. That > > > > triggers warnings with `CONFIG_FRAME_WARN` set to 1024. > > > > > > > > Instead of putting this `struct cpumask` on the stack, let's declare > > > > it as `static`. This has the downside of taking up 1K of memory all > > > > the time on systems with `CONFIG_NR_CPUS` to 8192, but on systems with > > > > smaller `CONFIG_NR_CPUS` it's not much emory (with 128 CPUs it's only > > > > 16 bytes of memory). Presumably anyone building a system with > > > > `CONFIG_NR_CPUS=8192` can afford the extra 1K of memory. > > > > > > > > NOTE: as part of this change, we no longer check the return value of > > > > trigger_single_cpu_backtrace(). While we could do this and only call > > > > cpumask_clear_cpu() if trigger_single_cpu_backtrace() didn't fail, > > > > that's probably not worth it. There's no reason to believe that > > > > trigger_cpumask_backtrace() will succeed at backtracing the CPU when > > > > trigger_single_cpu_backtrace() failed. > > > > > > > > Alternatives considered: > > > > - Use kmalloc with GFP_ATOMIC to allocate. I decided against this > > > > since relying on kmalloc when the system is hard locked up seems > > > > like a bad idea. > > > > - Change the arch_trigger_cpumask_backtrace() across all architectures > > > > to take an extra parameter to get the needed behavior. This seems > > > > like a lot of churn for a small savings. > > > > > > > > Fixes: 77c12fc95980 ("watchdog/hardlockup: add a "cpu" param to watchdog_hardlockup_check()") > > > > Reported-by: kernel test robot <lkp@intel.com> > > > > Closes: https://lore.kernel.org/r/202307310955.pLZDhpnl-lkp@intel.com > > > > Signed-off-by: Douglas Anderson <dianders@chromium.org> > > > > --- > > > > > > > > kernel/watchdog.c | 14 +++++++------- > > > > 1 file changed, 7 insertions(+), 7 deletions(-) > > > > > > > > diff --git a/kernel/watchdog.c b/kernel/watchdog.c > > > > index be38276a365f..19db2357969a 100644 > > > > --- a/kernel/watchdog.c > > > > +++ b/kernel/watchdog.c > > > > @@ -151,9 +151,6 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs) > > > > */ > > > > if (is_hardlockup(cpu)) { > > > > unsigned int this_cpu = smp_processor_id(); > > > > - struct cpumask backtrace_mask; > > > > - > > > > - cpumask_copy(&backtrace_mask, cpu_online_mask); > > > > > > > > /* Only print hardlockups once. */ > > > > if (per_cpu(watchdog_hardlockup_warned, cpu)) > > > > @@ -167,10 +164,8 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs) > > > > show_regs(regs); > > > > else > > > > dump_stack(); > > > > - cpumask_clear_cpu(cpu, &backtrace_mask); > > > > } else { > > > > - if (trigger_single_cpu_backtrace(cpu)) > > > > - cpumask_clear_cpu(cpu, &backtrace_mask); > > > > + trigger_single_cpu_backtrace(cpu); > > > > } > > > > > > > > /* > > > > @@ -178,8 +173,13 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs) > > > > * hardlockups generating interleaving traces > > > > */ > > > > if (sysctl_hardlockup_all_cpu_backtrace && > > > > - !test_and_set_bit(0, &watchdog_hardlockup_all_cpu_dumped)) > > > > + !test_and_set_bit(0, &watchdog_hardlockup_all_cpu_dumped)) { > > > > + static struct cpumask backtrace_mask; > > > > + > > > > + cpumask_copy(&backtrace_mask, cpu_online_mask); > > > > + cpumask_clear_cpu(cpu, &backtrace_mask); > > > > trigger_cpumask_backtrace(&backtrace_mask); > > > > > > This looks rather wasteful to just copy the cpumask over to > > > backtrace_mask in nmi_trigger_cpumask_backtrace (which all but sparc > > > arches do AFAICS). > > > > > > Would it be possible to use arch_trigger_cpumask_backtrace(cpu_online_mask, false) > > > and special case cpu != this_cpu && sysctl_hardlockup_all_cpu_backtrace? > > > > So you're saying optimize the case where cpu == this_cpu and then have > > a special case (where we still copy) for cpu != this_cpu? I can do > > that if that's what people want, but (assuming I understand correctly) > > that's making the wrong tradeoff. Specifically, this code runs one > > time right as we're crashing and if it takes an extra millisecond to > > run it's not a huge deal. It feels better to avoid the special case > > and keep the code smaller. > > > > Let me know if I misunderstood. > > I meant something like this (untested but it should give an idea what I > mean hopefully). Maybe it can be simplified even further. TBH I am not > entirely sure why cpu == this_cpu needs this special handling. > --- > diff --git a/kernel/watchdog.c b/kernel/watchdog.c > index be38276a365f..0eedac9f1019 100644 > --- a/kernel/watchdog.c > +++ b/kernel/watchdog.c > @@ -151,9 +151,7 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs) > */ > if (is_hardlockup(cpu)) { > unsigned int this_cpu = smp_processor_id(); > - struct cpumask backtrace_mask; > - > - cpumask_copy(&backtrace_mask, cpu_online_mask); > + bool dump_all = false; > > /* Only print hardlockups once. */ > if (per_cpu(watchdog_hardlockup_warned, cpu)) > @@ -167,10 +165,6 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs) > show_regs(regs); > else > dump_stack(); > - cpumask_clear_cpu(cpu, &backtrace_mask); > - } else { > - if (trigger_single_cpu_backtrace(cpu)) > - cpumask_clear_cpu(cpu, &backtrace_mask); > } > > /* > @@ -179,7 +173,12 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs) > */ > if (sysctl_hardlockup_all_cpu_backtrace && > !test_and_set_bit(0, &watchdog_hardlockup_all_cpu_dumped)) > - trigger_cpumask_backtrace(&backtrace_mask); > + dump_all = true; > + > + if (dump_all) > + arch_trigger_cpumask_backtrace(cpu_online_mask, cpu != this_cpu); > + else if (cpu != this_cpu) > + trigger_single_cpu_backtrace(cpu); Ah, I see what you mean. The one issue I have with your solution is that the ordering of the stack crawls is less ideal in the "dump all" case when cpu != this_cpu. We really want to see the stack crawl of the locked up CPU first and _then_ see the stack crawls of other CPUs. With your solution the locked up CPU will be interspersed with all the others and will be harder to find in the output (you've got to match it up with the "Watchdog detected hard LOCKUP on cpu N" message). While that's probably not a huge deal, it's nicer to make the output easy to understand for someone trying to parse it... -Doug
On Tue 01-08-23 08:41:49, Doug Anderson wrote: [...] > Ah, I see what you mean. The one issue I have with your solution is > that the ordering of the stack crawls is less ideal in the "dump all" > case when cpu != this_cpu. We really want to see the stack crawl of > the locked up CPU first and _then_ see the stack crawls of other CPUs. > With your solution the locked up CPU will be interspersed with all the > others and will be harder to find in the output (you've got to match > it up with the "Watchdog detected hard LOCKUP on cpu N" message). > While that's probably not a huge deal, it's nicer to make the output > easy to understand for someone trying to parse it... Is it worth to waste memory for this arguably nicer output? Identifying the stack of the locked up CPU is trivial.
Hi, On Wed, Aug 2, 2023 at 12:27 AM Michal Hocko <mhocko@suse.com> wrote: > > On Tue 01-08-23 08:41:49, Doug Anderson wrote: > [...] > > Ah, I see what you mean. The one issue I have with your solution is > > that the ordering of the stack crawls is less ideal in the "dump all" > > case when cpu != this_cpu. We really want to see the stack crawl of > > the locked up CPU first and _then_ see the stack crawls of other CPUs. > > With your solution the locked up CPU will be interspersed with all the > > others and will be harder to find in the output (you've got to match > > it up with the "Watchdog detected hard LOCKUP on cpu N" message). > > While that's probably not a huge deal, it's nicer to make the output > > easy to understand for someone trying to parse it... > > Is it worth to waste memory for this arguably nicer output? Identifying > the stack of the locked up CPU is trivial. I guess it's debatable, but as someone who has spent time staring at trawling through reports generated like this, I'd say "yes", it's super helpful in understanding the problem to have the hung CPU first. Putting the memory usage in perspective: * On a kernel built with a more normal number of max CPUs, like 256, this is only a use of 32 bytes of memory. That's 8 CPU instructions worth of memory. * Even on a system with the largest number of max CPUs we currently allow (8192), this is only a use of 1024 bytes of memory. Sure, that's a big chunk, but this is also something on our largest systems. In any case, how about this. We only need the memory allocated if `sysctl_hardlockup_all_cpu_backtrace` is non-zero. I can hook in whenever that's changed (should be just at bootup) and then kmalloc memory then. This really limits the extra memory to just cases when it's useful. Presumably on systems that are designed to run massively SMP they wouldn't want to turn this knob on anyway since it would spew far too much data. If you took a kernel compiled for max SMP, ran it on a machine with only a few cores, and wanted this feature turned on then at most you'd be chewing up 1K. In the average case this would chew up some extra memory (extra CPU instructions to implement the function take code space, extra overhead around kmalloc) but it would avoid the 1K chunk in most cases. Does that sound reasonable? I guess my last alternative would be to keep the special case of tracing the hung CPU first (this is the most important part IMO) and then accept the double trace, AKA: /* Try to avoid re-dumping the stack on the hung CPU if possible */ if (cpu == this_cpu)) trigger_allbutself_cpu_backtrace(); else trigger_all_cpu_backtrace(); -Doug
On Wed 02-08-23 07:12:29, Doug Anderson wrote: > Hi, > > On Wed, Aug 2, 2023 at 12:27 AM Michal Hocko <mhocko@suse.com> wrote: > > > > On Tue 01-08-23 08:41:49, Doug Anderson wrote: > > [...] > > > Ah, I see what you mean. The one issue I have with your solution is > > > that the ordering of the stack crawls is less ideal in the "dump all" > > > case when cpu != this_cpu. We really want to see the stack crawl of > > > the locked up CPU first and _then_ see the stack crawls of other CPUs. > > > With your solution the locked up CPU will be interspersed with all the > > > others and will be harder to find in the output (you've got to match > > > it up with the "Watchdog detected hard LOCKUP on cpu N" message). > > > While that's probably not a huge deal, it's nicer to make the output > > > easy to understand for someone trying to parse it... > > > > Is it worth to waste memory for this arguably nicer output? Identifying > > the stack of the locked up CPU is trivial. > > I guess it's debatable, but as someone who has spent time staring at > trawling through reports generated like this, I'd say "yes", it's > super helpful in understanding the problem to have the hung CPU first. Well, I have to admit that most lockdep splats I have dealt with recently do not come with sysctl_hardlockup_all_cpu_backtrace so I cannot really judge. > Putting the memory usage in perspective: > > * On a kernel built with a more normal number of max CPUs, like 256, > this is only a use of 32 bytes of memory. That's 8 CPU instructions > worth of memory. Think of distribution kernels that many people use. E.g SLES kernel uses 8k CONFIG_NR_CPUS > * Even on a system with the largest number of max CPUs we currently > allow (8192), this is only a use of 1024 bytes of memory. Sure, that's > a big chunk, but this is also something on our largest systems. This is independent on the size of the machine if you are using pre-built kernels. > In any case, how about this. We only need the memory allocated if > `sysctl_hardlockup_all_cpu_backtrace` is non-zero. I can hook in > whenever that's changed (should be just at bootup) and then kmalloc > memory then. this is certainly better than the original proposal > This really limits the extra memory to just cases when > it's useful. Presumably on systems that are designed to run massively > SMP they wouldn't want to turn this knob on anyway since it would spew > far too much data. If you took a kernel compiled for max SMP, ran it > on a machine with only a few cores, and wanted this feature turned on > then at most you'd be chewing up 1K. In the average case this would > chew up some extra memory (extra CPU instructions to implement the > function take code space, extra overhead around kmalloc) but it would > avoid the 1K chunk in most cases. > > Does that sound reasonable? If the locked up cpu needs to be first is a real requirement (and this seems debateable) then sure why not. I do not feel strongly to argue one way or the other, maybe others have an opinion on that. > I guess my last alternative would be to keep the special case of > tracing the hung CPU first (this is the most important part IMO) and > then accept the double trace, AKA: That sounds wrong. > /* Try to avoid re-dumping the stack on the hung CPU if possible */ > if (cpu == this_cpu)) > trigger_allbutself_cpu_backtrace(); > else > trigger_all_cpu_backtrace(); > > -Doug
On Wed 2023-08-02 07:12:29, Doug Anderson wrote: > Hi, > > On Wed, Aug 2, 2023 at 12:27 AM Michal Hocko <mhocko@suse.com> wrote: > > > > On Tue 01-08-23 08:41:49, Doug Anderson wrote: > > [...] > > > Ah, I see what you mean. The one issue I have with your solution is > > > that the ordering of the stack crawls is less ideal in the "dump all" > > > case when cpu != this_cpu. We really want to see the stack crawl of > > > the locked up CPU first and _then_ see the stack crawls of other CPUs. > > > With your solution the locked up CPU will be interspersed with all the > > > others and will be harder to find in the output (you've got to match > > > it up with the "Watchdog detected hard LOCKUP on cpu N" message). > > > While that's probably not a huge deal, it's nicer to make the output > > > easy to understand for someone trying to parse it... > > > > Is it worth to waste memory for this arguably nicer output? Identifying > > the stack of the locked up CPU is trivial. > > I guess it's debatable, but as someone who has spent time staring at > trawling through reports generated like this, I'd say "yes", it's > super helpful in understanding the problem to have the hung CPU first. > Putting the memory usage in perspective: nmi_trigger_cpumask_backtrace() has its own copy of the cpu mask. What about changing the @exclude_self parameter to @exclude_cpu and do: if (exclude_cpu >= 0) cpumask_clear_cpu(exclude_cpu, to_cpumask(backtrace_mask)); It would require changing also arch_trigger_cpumask_backtrace() to void arch_trigger_cpumask_backtrace(const struct cpumask *mask, int exclude_cpu); but it looks doable. It might theoretically change the behavior because nmi_trigger_cpumask_backtrace() does this_cpu = get_cpu(); I guess that it was used to make sure that smp_processor_id() the same. But it is too late. It should ignore the callers CPU. So, this might actually fix a possible race. Note that get_cpu() also disabled preemption. IMHO, it is not strictly needed. But it might make sense to keep it because the backtraces are printed when something goes wrong and it should print backtraces from the current state. Best Regards, Petr
On Thu 03-08-23 10:12:12, Petr Mladek wrote: > On Wed 2023-08-02 07:12:29, Doug Anderson wrote: > > Hi, > > > > On Wed, Aug 2, 2023 at 12:27 AM Michal Hocko <mhocko@suse.com> wrote: > > > > > > On Tue 01-08-23 08:41:49, Doug Anderson wrote: > > > [...] > > > > Ah, I see what you mean. The one issue I have with your solution is > > > > that the ordering of the stack crawls is less ideal in the "dump all" > > > > case when cpu != this_cpu. We really want to see the stack crawl of > > > > the locked up CPU first and _then_ see the stack crawls of other CPUs. > > > > With your solution the locked up CPU will be interspersed with all the > > > > others and will be harder to find in the output (you've got to match > > > > it up with the "Watchdog detected hard LOCKUP on cpu N" message). > > > > While that's probably not a huge deal, it's nicer to make the output > > > > easy to understand for someone trying to parse it... > > > > > > Is it worth to waste memory for this arguably nicer output? Identifying > > > the stack of the locked up CPU is trivial. > > > > I guess it's debatable, but as someone who has spent time staring at > > trawling through reports generated like this, I'd say "yes", it's > > super helpful in understanding the problem to have the hung CPU first. > > Putting the memory usage in perspective: > > nmi_trigger_cpumask_backtrace() has its own copy of the cpu mask. > What about changing the @exclude_self parameter to @exclude_cpu > and do: > > if (exclude_cpu >= 0) > cpumask_clear_cpu(exclude_cpu, to_cpumask(backtrace_mask)); > > > It would require changing also arch_trigger_cpumask_backtrace() to > > void arch_trigger_cpumask_backtrace(const struct cpumask *mask, > int exclude_cpu); > > but it looks doable. Yes, but sparc is doing its own thing so it would require changing that as well. But this looks reasonable as well.
Hi, On Thu, Aug 3, 2023 at 1:30 AM Michal Hocko <mhocko@suse.com> wrote: > > On Thu 03-08-23 10:12:12, Petr Mladek wrote: > > On Wed 2023-08-02 07:12:29, Doug Anderson wrote: > > > Hi, > > > > > > On Wed, Aug 2, 2023 at 12:27 AM Michal Hocko <mhocko@suse.com> wrote: > > > > > > > > On Tue 01-08-23 08:41:49, Doug Anderson wrote: > > > > [...] > > > > > Ah, I see what you mean. The one issue I have with your solution is > > > > > that the ordering of the stack crawls is less ideal in the "dump all" > > > > > case when cpu != this_cpu. We really want to see the stack crawl of > > > > > the locked up CPU first and _then_ see the stack crawls of other CPUs. > > > > > With your solution the locked up CPU will be interspersed with all the > > > > > others and will be harder to find in the output (you've got to match > > > > > it up with the "Watchdog detected hard LOCKUP on cpu N" message). > > > > > While that's probably not a huge deal, it's nicer to make the output > > > > > easy to understand for someone trying to parse it... > > > > > > > > Is it worth to waste memory for this arguably nicer output? Identifying > > > > the stack of the locked up CPU is trivial. > > > > > > I guess it's debatable, but as someone who has spent time staring at > > > trawling through reports generated like this, I'd say "yes", it's > > > super helpful in understanding the problem to have the hung CPU first. > > > Putting the memory usage in perspective: > > > > nmi_trigger_cpumask_backtrace() has its own copy of the cpu mask. > > What about changing the @exclude_self parameter to @exclude_cpu > > and do: > > > > if (exclude_cpu >= 0) > > cpumask_clear_cpu(exclude_cpu, to_cpumask(backtrace_mask)); > > > > > > It would require changing also arch_trigger_cpumask_backtrace() to > > > > void arch_trigger_cpumask_backtrace(const struct cpumask *mask, > > int exclude_cpu); > > > > but it looks doable. > > Yes, but sparc is doing its own thing so it would require changing that > as well. But this looks reasonable as well. OK. I've tried a v3 with that: https://lore.kernel.org/r/20230803160649.v3.2.I501ab68cb926ee33a7c87e063d207abf09b9943c@changeid -Doug
diff --git a/kernel/watchdog.c b/kernel/watchdog.c index be38276a365f..19db2357969a 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -151,9 +151,6 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs) */ if (is_hardlockup(cpu)) { unsigned int this_cpu = smp_processor_id(); - struct cpumask backtrace_mask; - - cpumask_copy(&backtrace_mask, cpu_online_mask); /* Only print hardlockups once. */ if (per_cpu(watchdog_hardlockup_warned, cpu)) @@ -167,10 +164,8 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs) show_regs(regs); else dump_stack(); - cpumask_clear_cpu(cpu, &backtrace_mask); } else { - if (trigger_single_cpu_backtrace(cpu)) - cpumask_clear_cpu(cpu, &backtrace_mask); + trigger_single_cpu_backtrace(cpu); } /* @@ -178,8 +173,13 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs) * hardlockups generating interleaving traces */ if (sysctl_hardlockup_all_cpu_backtrace && - !test_and_set_bit(0, &watchdog_hardlockup_all_cpu_dumped)) + !test_and_set_bit(0, &watchdog_hardlockup_all_cpu_dumped)) { + static struct cpumask backtrace_mask; + + cpumask_copy(&backtrace_mask, cpu_online_mask); + cpumask_clear_cpu(cpu, &backtrace_mask); trigger_cpumask_backtrace(&backtrace_mask); + } if (hardlockup_panic) nmi_panic(regs, "Hard LOCKUP");