Message ID | 20230227023508.102230-1-yangjihong1@huawei.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp2208583wrd; Sun, 26 Feb 2023 19:18:06 -0800 (PST) X-Google-Smtp-Source: AK7set/UFK2KokqfJkDm2QInAJUjmqg31oiokQakUp8sJOhpaTIIa+b13pul5YlcDtrtRXNqANiN X-Received: by 2002:aa7:d048:0:b0:4ab:554:37ea with SMTP id n8-20020aa7d048000000b004ab055437eamr24135835edo.4.1677467886765; Sun, 26 Feb 2023 19:18:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1677467886; cv=none; d=google.com; s=arc-20160816; b=vdIbWwe7qBHNIjlyI9m0zIyYaumIG1xLCHy/LIVWGQjNXUPOYChmeQqZvdX7+NNL6u TJaLdiNnE/z479OlgUEyfj4rThea9/FckQgDhfrDd1x9NtSuBqufuKCoTeSru22Hvlxa eKCASLghSx84wonSCvPAIykbOBqbRpS23DdfS/74F7RSGIas7AY7kzoCYCQsjT4i7BzI vAg84r/zisBsmeeLsu6yY83hOJQVc4knivZN35UTOOuYn6CUh23ZFN478Xu6tHknewcU e5XvHKzJrcZOD8H5BOpul/QyfCd7v5Yrp+SsXCORGWMG++bzVimaE0lvUcHZPubDYfJQ Szzw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=UKXB/MLOk//ZnXwO9zG74GqSSyTfFakqeyg8dHIn5Os=; b=Ng1xHEXktRU0P7ceRv3SeOG/tcsear/PPNXt/Olh+M8voaBhsON21YKF+fNOjmh3V5 5Hmscv2eczmbnbjPexI6G3FP8q45Csj2vej9c8MIWvqzMKfRkiCM+NfrduyBV633wfZb ViFgox4/jSts6ovZEQSxQGbIYiLSeWJs36qu0wa92Ljt6fH07jQ8Ru2/ArlWXIsDiB0p OnAk8xNSqJcXqRcop6CrSK4B+DTJvH8bVaTscdUhS11PFUnPAvL54HQgm8LyKsWy6Qu4 Kt2/VDxJ9yrFIENcKZjt87+pHDvZtft+KrEIx0PYxraPw7Tx4Ks++YJoHSo+fZc3vmU9 bREw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n8-20020a056402060800b004aef418518asi6544626edv.282.2023.02.26.19.17.44; Sun, 26 Feb 2023 19:18:06 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231177AbjB0Ciz (ORCPT <rfc822;sukrut.bellary@gmail.com> + 99 others); Sun, 26 Feb 2023 21:38:55 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60034 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231689AbjB0Cil (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Sun, 26 Feb 2023 21:38:41 -0500 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0B1393C0C; Sun, 26 Feb 2023 18:38:09 -0800 (PST) Received: from kwepemm600003.china.huawei.com (unknown [172.30.72.54]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4PQ4NK73DVzKq8Z; Mon, 27 Feb 2023 10:35:13 +0800 (CST) Received: from ubuntu1804.huawei.com (10.67.174.61) by kwepemm600003.china.huawei.com (7.193.23.202) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.21; Mon, 27 Feb 2023 10:37:10 +0800 From: Yang Jihong <yangjihong1@huawei.com> To: <peterz@infradead.org>, <mingo@redhat.com>, <acme@kernel.org>, <mark.rutland@arm.com>, <alexander.shishkin@linux.intel.com>, <jolsa@kernel.org>, <namhyung@kernel.org>, <irogers@google.com>, <eranian@google.com>, <linux-perf-users@vger.kernel.org>, <linux-kernel@vger.kernel.org> CC: <yangjihong1@huawei.com> Subject: [PATCH RESEND v3] perf/core: Fix hardlockup failure caused by perf throttle Date: Mon, 27 Feb 2023 10:35:08 +0800 Message-ID: <20230227023508.102230-1-yangjihong1@huawei.com> X-Mailer: git-send-email 2.30.GIT MIME-Version: 1.0 Content-Transfer-Encoding: 7BIT Content-Type: text/plain; charset=US-ASCII X-Originating-IP: [10.67.174.61] X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To kwepemm600003.china.huawei.com (7.193.23.202) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758952566843846456?= X-GMAIL-MSGID: =?utf-8?q?1758952566843846456?= |
Series |
[RESEND,v3] perf/core: Fix hardlockup failure caused by perf throttle
|
|
Commit Message
Yang Jihong
Feb. 27, 2023, 2:35 a.m. UTC
commit e050e3f0a71bf ("perf: Fix broken interrupt rate throttling")
introduces a change in throttling threshold judgment. Before this,
compare hwc->interrupts and max_samples_per_tick, then increase
hwc->interrupts by 1, but this commit reverses order of these two
behaviors, causing the semantics of max_samples_per_tick to change.
In literal sense of "max_samples_per_tick", if hwc->interrupts ==
max_samples_per_tick, it should not be throttled, therefore, the judgment
condition should be changed to "hwc->interrupts > max_samples_per_tick".
In fact, this may cause the hardlockup to fail, The minimum value of
max_samples_per_tick may be 1, in this case, the return value of
__perf_event_account_interrupt function is 1.
As a result, nmi_watchdog gets throttled, which would stop PMU (Use x86
architecture as an example, see x86_pmu_handle_irq).
Fixes: e050e3f0a71b ("perf: Fix broken interrupt rate throttling")
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
---
Changes since v2:
- Add fixed commit.
Changes since v1:
- Modify commit title.
kernel/events/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Comments
Hello, PING. Thanks, Yang. On 2023/2/27 10:35, Yang Jihong wrote: > commit e050e3f0a71bf ("perf: Fix broken interrupt rate throttling") > introduces a change in throttling threshold judgment. Before this, > compare hwc->interrupts and max_samples_per_tick, then increase > hwc->interrupts by 1, but this commit reverses order of these two > behaviors, causing the semantics of max_samples_per_tick to change. > In literal sense of "max_samples_per_tick", if hwc->interrupts == > max_samples_per_tick, it should not be throttled, therefore, the judgment > condition should be changed to "hwc->interrupts > max_samples_per_tick". > > In fact, this may cause the hardlockup to fail, The minimum value of > max_samples_per_tick may be 1, in this case, the return value of > __perf_event_account_interrupt function is 1. > As a result, nmi_watchdog gets throttled, which would stop PMU (Use x86 > architecture as an example, see x86_pmu_handle_irq). > > Fixes: e050e3f0a71b ("perf: Fix broken interrupt rate throttling") > Signed-off-by: Yang Jihong <yangjihong1@huawei.com> > --- > > Changes since v2: > - Add fixed commit. > > Changes since v1: > - Modify commit title. > > kernel/events/core.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/events/core.c b/kernel/events/core.c > index f79fd8b87f75..0540a8653906 100644 > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -9434,7 +9434,7 @@ __perf_event_account_interrupt(struct perf_event *event, int throttle) > } else { > hwc->interrupts++; > if (unlikely(throttle > - && hwc->interrupts >= max_samples_per_tick)) { > + && hwc->interrupts > max_samples_per_tick)) { > __this_cpu_inc(perf_throttled_count); > tick_dep_set_cpu(smp_processor_id(), TICK_DEP_BIT_PERF_EVENTS); > hwc->interrupts = MAX_INTERRUPTS; >
Hello, PING. This patch has not been responded. Please take time to check whether the fix solution is OK. Look forward to reviewing the patch. Thanks :) Thanks, Yang. On 2023/3/6 9:14, Yang Jihong wrote: > Hello, > > PING. > > Thanks, > Yang. > > On 2023/2/27 10:35, Yang Jihong wrote: >> commit e050e3f0a71bf ("perf: Fix broken interrupt rate throttling") >> introduces a change in throttling threshold judgment. Before this, >> compare hwc->interrupts and max_samples_per_tick, then increase >> hwc->interrupts by 1, but this commit reverses order of these two >> behaviors, causing the semantics of max_samples_per_tick to change. >> In literal sense of "max_samples_per_tick", if hwc->interrupts == >> max_samples_per_tick, it should not be throttled, therefore, the judgment >> condition should be changed to "hwc->interrupts > max_samples_per_tick". >> >> In fact, this may cause the hardlockup to fail, The minimum value of >> max_samples_per_tick may be 1, in this case, the return value of >> __perf_event_account_interrupt function is 1. >> As a result, nmi_watchdog gets throttled, which would stop PMU (Use x86 >> architecture as an example, see x86_pmu_handle_irq). >> >> Fixes: e050e3f0a71b ("perf: Fix broken interrupt rate throttling") >> Signed-off-by: Yang Jihong <yangjihong1@huawei.com> >> --- >> >> Changes since v2: >> - Add fixed commit. >> >> Changes since v1: >> - Modify commit title. >> >> kernel/events/core.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/kernel/events/core.c b/kernel/events/core.c >> index f79fd8b87f75..0540a8653906 100644 >> --- a/kernel/events/core.c >> +++ b/kernel/events/core.c >> @@ -9434,7 +9434,7 @@ __perf_event_account_interrupt(struct perf_event >> *event, int throttle) >> } else { >> hwc->interrupts++; >> if (unlikely(throttle >> - && hwc->interrupts >= max_samples_per_tick)) { >> + && hwc->interrupts > max_samples_per_tick)) { >> __this_cpu_inc(perf_throttled_count); >> tick_dep_set_cpu(smp_processor_id(), >> TICK_DEP_BIT_PERF_EVENTS); >> hwc->interrupts = MAX_INTERRUPTS; >> > > .
Hello, PING again. Look forward the review. Thanks, Yang. On 2023/3/22 15:36, Yang Jihong wrote: > Hello, > > PING. > > This patch has not been responded. > Please take time to check whether the fix solution is OK. > Look forward to reviewing the patch. Thanks :) > > Thanks, > Yang. > > On 2023/3/6 9:14, Yang Jihong wrote: >> Hello, >> >> PING. >> >> Thanks, >> Yang. >> >> On 2023/2/27 10:35, Yang Jihong wrote: >>> commit e050e3f0a71bf ("perf: Fix broken interrupt rate throttling") >>> introduces a change in throttling threshold judgment. Before this, >>> compare hwc->interrupts and max_samples_per_tick, then increase >>> hwc->interrupts by 1, but this commit reverses order of these two >>> behaviors, causing the semantics of max_samples_per_tick to change. >>> In literal sense of "max_samples_per_tick", if hwc->interrupts == >>> max_samples_per_tick, it should not be throttled, therefore, the >>> judgment >>> condition should be changed to "hwc->interrupts > max_samples_per_tick". >>> >>> In fact, this may cause the hardlockup to fail, The minimum value of >>> max_samples_per_tick may be 1, in this case, the return value of >>> __perf_event_account_interrupt function is 1. >>> As a result, nmi_watchdog gets throttled, which would stop PMU (Use x86 >>> architecture as an example, see x86_pmu_handle_irq). >>> >>> Fixes: e050e3f0a71b ("perf: Fix broken interrupt rate throttling") >>> Signed-off-by: Yang Jihong <yangjihong1@huawei.com> >>> --- >>> >>> Changes since v2: >>> - Add fixed commit. >>> >>> Changes since v1: >>> - Modify commit title. >>> >>> kernel/events/core.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/kernel/events/core.c b/kernel/events/core.c >>> index f79fd8b87f75..0540a8653906 100644 >>> --- a/kernel/events/core.c >>> +++ b/kernel/events/core.c >>> @@ -9434,7 +9434,7 @@ __perf_event_account_interrupt(struct >>> perf_event *event, int throttle) >>> } else { >>> hwc->interrupts++; >>> if (unlikely(throttle >>> - && hwc->interrupts >= max_samples_per_tick)) { >>> + && hwc->interrupts > max_samples_per_tick)) { >>> __this_cpu_inc(perf_throttled_count); >>> tick_dep_set_cpu(smp_processor_id(), >>> TICK_DEP_BIT_PERF_EVENTS); >>> hwc->interrupts = MAX_INTERRUPTS; >>> >> >> . > > .
On Mon, Feb 27, 2023 at 10:35:08AM +0800, Yang Jihong wrote: > commit e050e3f0a71bf ("perf: Fix broken interrupt rate throttling") > introduces a change in throttling threshold judgment. Before this, > compare hwc->interrupts and max_samples_per_tick, then increase > hwc->interrupts by 1, but this commit reverses order of these two > behaviors, causing the semantics of max_samples_per_tick to change. > In literal sense of "max_samples_per_tick", if hwc->interrupts == > max_samples_per_tick, it should not be throttled, therefore, the judgment > condition should be changed to "hwc->interrupts > max_samples_per_tick". > > In fact, this may cause the hardlockup to fail, The minimum value of > max_samples_per_tick may be 1, in this case, the return value of > __perf_event_account_interrupt function is 1. > As a result, nmi_watchdog gets throttled, which would stop PMU (Use x86 > architecture as an example, see x86_pmu_handle_irq). > > Fixes: e050e3f0a71b ("perf: Fix broken interrupt rate throttling") > Signed-off-by: Yang Jihong <yangjihong1@huawei.com> > --- > kernel/events/core.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/events/core.c b/kernel/events/core.c > index f79fd8b87f75..0540a8653906 100644 > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -9434,7 +9434,7 @@ __perf_event_account_interrupt(struct perf_event *event, int throttle) > } else { > hwc->interrupts++; > if (unlikely(throttle > - && hwc->interrupts >= max_samples_per_tick)) { > + && hwc->interrupts > max_samples_per_tick)) { > __this_cpu_inc(perf_throttled_count); > tick_dep_set_cpu(smp_processor_id(), TICK_DEP_BIT_PERF_EVENTS); > hwc->interrupts = MAX_INTERRUPTS; Thanks, I've made a slight edit to fix the && placement.
diff --git a/kernel/events/core.c b/kernel/events/core.c index f79fd8b87f75..0540a8653906 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -9434,7 +9434,7 @@ __perf_event_account_interrupt(struct perf_event *event, int throttle) } else { hwc->interrupts++; if (unlikely(throttle - && hwc->interrupts >= max_samples_per_tick)) { + && hwc->interrupts > max_samples_per_tick)) { __this_cpu_inc(perf_throttled_count); tick_dep_set_cpu(smp_processor_id(), TICK_DEP_BIT_PERF_EVENTS); hwc->interrupts = MAX_INTERRUPTS;