Message ID | 20230204193345.842-1-shiju.jose@huawei.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp1486535wrn; Sat, 4 Feb 2023 11:57:10 -0800 (PST) X-Google-Smtp-Source: AK7set/4jJsCxV6/rvFJsU6FPYwFo/THG2pQWvaMe7sDRn0YhQnBje32K3GktZDI7jqfI70b+4tz X-Received: by 2002:a05:6402:28d:b0:4a2:73bb:304b with SMTP id l13-20020a056402028d00b004a273bb304bmr14888308edv.4.1675540630760; Sat, 04 Feb 2023 11:57:10 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1675540630; cv=none; d=google.com; s=arc-20160816; b=mSsZK6WdIQxWJnJSYH3io76MtxSbroasGZUFUDmokmplM5mAIpD68PTC9IvXoZp94H RAgLF9OKIopomOoEYoGYsmOmgguT29hm5lnKKF/0L0kx7P5c9SSsc+ca9fFmeRgzQT0g xkEaSo9Sml98tbaVKktnDJx+ulaKqBuN3uo79lYKU97J/N+tFUmnVQ6h/gfLIzT8CkZk iyKh/tGgE8PI7wcaOQHCAlw92sgElggiPzfSzjM/nhfHxMl2p4p659wRD+Mz4FFR6ctG +Q0eoQ29vZCfMuJaBGjqm/OY+UtrEykq1SLq9TO8v+WklWA6l3+9Qmrd/MwnYYuCHkZJ 0OkA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=LiaRZh1fdk6w9LgmrPjhIgT5109aVFDs+naJdjIaYTI=; b=wHwkTolMvAmrX0krn7aakJqCsJCyF9dV5TpQNHaRMwR/RNiILbZ/j5Q9qmbf136+Ja C/XvFk5DjImoNhIFl25W+7zsDhy/DnJwOZhphqxzRjCTdI/BOsBKsxQ/+KS7JUnRgEgr 6/81y0Ll6gHY8DryVhN2GkIUyXAxryHJE38G9PqyN07OaEuYDPpqyIEfr/w1nrFfLjrJ +tLcB+3SjMaRAEyNQuYwj+Hj3czVL+pdlKMM+5oWsvJLB1DhMEIDbk/OZc1oFgA/jopW 1/o0MWqYkmNFUor/YhTQvP47F9U81CS6uEaudPjA6TARvybI3V1YvRqO27ozaGWjZ176 tPGA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l26-20020a056402231a00b0049f88f0dc72si6720804eda.454.2023.02.04.11.56.47; Sat, 04 Feb 2023 11:57:10 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231821AbjBDTd7 (ORCPT <rfc822;il.mystafa@gmail.com> + 99 others); Sat, 4 Feb 2023 14:33:59 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40998 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229746AbjBDTd6 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Sat, 4 Feb 2023 14:33:58 -0500 Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BCD2C2B090; Sat, 4 Feb 2023 11:33:53 -0800 (PST) Received: from lhrpeml500006.china.huawei.com (unknown [172.18.147.207]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4P8Mzq6hCkz67K2n; Sun, 5 Feb 2023 03:29:59 +0800 (CST) Received: from P_UKIT01-A7bmah.china.huawei.com (10.195.244.18) by lhrpeml500006.china.huawei.com (7.191.161.198) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.17; Sat, 4 Feb 2023 19:33:50 +0000 From: <shiju.jose@huawei.com> To: <mchehab@kernel.org>, <linux-edac@vger.kernel.org> CC: <rostedt@goodmis.org>, <mhiramat@kernel.org>, <linux-kernel@vger.kernel.org>, <linux-trace-kernel@vger.kernel.org>, <tanxiaofei@huawei.com>, <jonathan.cameron@huawei.com>, <linuxarm@huawei.com>, <shiju.jose@huawei.com> Subject: [RFC PATCH V2 1/1] rasdaemon: Fix poll() on per_cpu trace_pipe_raw blocks indefinitely Date: Sat, 4 Feb 2023 19:33:45 +0000 Message-ID: <20230204193345.842-1-shiju.jose@huawei.com> X-Mailer: git-send-email 2.26.0.windows.1 MIME-Version: 1.0 Content-Transfer-Encoding: 7BIT Content-Type: text/plain; charset=US-ASCII X-Originating-IP: [10.195.244.18] X-ClientProxiedBy: lhrpeml500004.china.huawei.com (7.191.163.9) To lhrpeml500006.china.huawei.com (7.191.161.198) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1756931692475725052?= X-GMAIL-MSGID: =?utf-8?q?1756931692475725052?= |
Series |
[RFC,V2,1/1] rasdaemon: Fix poll() on per_cpu trace_pipe_raw blocks indefinitely
|
|
Commit Message
Shiju Jose
Feb. 4, 2023, 7:33 p.m. UTC
From: Shiju Jose <shiju.jose@huawei.com> The error events are not received in the rasdaemon since kernel 6.1-rc6. This issue is firstly detected and reported, when testing the CXL error events in the rasdaemon. Debugging showed, poll() on trace_pipe_raw in the ras-events.c do not return and this issue is seen after the commit 42fb0a1e84ff525ebe560e2baf9451ab69127e2b ("tracing/ring-buffer: Have polling block on watermark"). This also verified using a test application for poll() and select() on trace_pipe_raw. There is also a bug reported on this issue, https://lore.kernel.org/all/31eb3b12-3350-90a4-a0d9-d1494db7cf74@oracle.com/ This issue occurs for the per_cpu case, which calls the ring_buffer_poll_wait(), in kernel/trace/ring_buffer.c, with the buffer_percent > 0 and then wait until the percentage of pages are available.The default value set for the buffer_percent is 50 in the kernel/trace/trace.c. However poll() does not return even met the percentage of pages condition. As a fix, rasdaemon set buffer_percent as 0 through the /sys/kernel/debug/tracing/instances/rasdaemon/buffer_percent, then the task will wake up as soon as data is added to any of the specific cpu buffer and poll() on per_cpu/cpuX/trace_pipe_raw does not block indefinitely. Dependency on the kernel RFC patch tracing: Fix poll() and select() do not work on per_cpu trace_pipe and trace_pipe_raw Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Changes: RFC V1 -> RFC V2 1. Rename the patch header subject. 2. Changes for the backward compatability to the old kernels. --- ras-events.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+)
Comments
Linux regression tracking (Thorsten Leemhuis)
Feb. 16, 2023, 11:47 a.m. UTC |
#1
Addressed
Unaddressed
Hi, this is your Linux kernel regression tracker. On 04.02.23 20:33, shiju.jose@huawei.com wrote: > From: Shiju Jose <shiju.jose@huawei.com> > > The error events are not received in the rasdaemon since kernel 6.1-rc6. > This issue is firstly detected and reported, when testing the CXL error > events in the rasdaemon. Thanks for working on this. This submission looks stalled, unless I missed something. This is unfortunate, as this afaics is fixing a regression (caused by a commit from Steven). Hence it would be good to get this fixed rather sooner than later. Or is the RFC in the subject the reason why there was no progress? Is it maybe time to remove it? > Debugging showed, poll() on trace_pipe_raw in the ras-events.c do not > return and this issue is seen after the commit > 42fb0a1e84ff525ebe560e2baf9451ab69127e2b ("tracing/ring-buffer: Have > polling block on watermark"). > > This also verified using a test application for poll() > and select() on trace_pipe_raw. > > There is also a bug reported on this issue, > https://lore.kernel.org/all/31eb3b12-3350-90a4-a0d9-d1494db7cf74@oracle.com/ > This issue occurs for the per_cpu case, which calls the > ring_buffer_poll_wait(), in kernel/trace/ring_buffer.c, with the > buffer_percent > 0 and then wait until the percentage of pages are > available.The default value set for the buffer_percent is 50 in the > kernel/trace/trace.c. However poll() does not return even met the percentage > of pages condition. > > As a fix, rasdaemon set buffer_percent as 0 through the > /sys/kernel/debug/tracing/instances/rasdaemon/buffer_percent, then the > task will wake up as soon as data is added to any of the specific cpu > buffer and poll() on per_cpu/cpuX/trace_pipe_raw does not block > indefinitely. > > Dependency on the kernel RFC patch > tracing: Fix poll() and select() do not work on per_cpu trace_pipe and trace_pipe_raw BTW, this patch afaics should have these tags: Fixes: 42fb0a1e84ff ("tracing/ring-buffer: Have polling block on watermark") Reported-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Link: https://lore.kernel.org/r/31eb3b12-3350-90a4-a0d9-d1494db7cf74@oracle.com/ An likely a Cc: <stable@vger.kernel.org> # 6.1.x Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page. #regzbot poke #regzbot ^backmonitor: https://lore.kernel.org/r/31eb3b12-3350-90a4-a0d9-d1494db7cf74@oracle.com/ > Signed-off-by: Shiju Jose <shiju.jose@huawei.com> > > Changes: > RFC V1 -> RFC V2 > 1. Rename the patch header subject. > 2. Changes for the backward compatability to the old kernels. > --- > ras-events.c | 22 ++++++++++++++++++++++ > 1 file changed, 22 insertions(+) > > diff --git a/ras-events.c b/ras-events.c > index 3691311..e505a0e 100644 > --- a/ras-events.c > +++ b/ras-events.c > @@ -383,6 +383,8 @@ static int read_ras_event_all_cpus(struct pthread_data *pdata, > int warnonce[n_cpus]; > char pipe_raw[PATH_MAX]; > int legacy_kernel = 0; > + int fd; > + char buf[10]; > #if 0 > int need_sleep = 0; > #endif > @@ -402,6 +404,26 @@ static int read_ras_event_all_cpus(struct pthread_data *pdata, > return -ENOMEM; > } > > + /* Fix for poll() on the per_cpu trace_pipe and trace_pipe_raw blocks > + * indefinitely with the default buffer_percent in the kernel trace system, > + * which is introduced by the following change in the kernel. > + * https://lore.kernel.org/all/20221020231427.41be3f26@gandalf.local.home/T/#u. > + * Set buffer_percent to 0 so that poll() will return immediately > + * when the trace data is available in the ras per_cpu trace pipe_raw > + */ > + fd = open_trace(pdata[0].ras, "buffer_percent", O_WRONLY); > + if (fd >= 0) { > + /* For the backward compatabilty to the old kernel, do not return > + * if fail to set the buffer_percent. > + */ > + snprintf(buf, sizeof(buf), "0"); > + size = write(fd, buf, strlen(buf)); > + if (size <= 0) > + log(TERM, LOG_WARNING, "can't write to buffer_percent\n"); > + close(fd); > + } else > + log(TERM, LOG_WARNING, "Can't open buffer_percent\n"); > + > for (i = 0; i < (n_cpus + 1); i++) > fds[i].fd = -1; >
Hello, >-----Original Message----- >From: Linux regression tracking (Thorsten Leemhuis) ><regressions@leemhuis.info> >Sent: 16 February 2023 11:48 >To: rostedt@goodmis.org >Cc: mhiramat@kernel.org; linux-kernel@vger.kernel.org; linux-trace- >kernel@vger.kernel.org; tanxiaofei <tanxiaofei@huawei.com>; Jonathan >Cameron <jonathan.cameron@huawei.com>; Linuxarm ><linuxarm@huawei.com>; Linux kernel regressions list ><regressions@lists.linux.dev>; Shiju Jose <shiju.jose@huawei.com>; >mchehab@kernel.org; linux-edac@vger.kernel.org >Subject: Re: [RFC PATCH V2 1/1] rasdaemon: Fix poll() on per_cpu >trace_pipe_raw blocks indefinitely > >Hi, this is your Linux kernel regression tracker. Kernel fix patch for this issue is already in the mainline. Please see the commit 3e46d910d8acf94e5360126593b68bf4fee4c4a1 ("tracing: Fix poll() and select() do not work on per_cpu trace_pipe and trace_pipe_raw") > >On 04.02.23 20:33, shiju.jose@huawei.com wrote: >> From: Shiju Jose <shiju.jose@huawei.com> >> >> The error events are not received in the rasdaemon since kernel 6.1-rc6. >> This issue is firstly detected and reported, when testing the CXL >> error events in the rasdaemon. > >Thanks for working on this. This submission looks stalled, unless I missed >something. This is unfortunate, as this afaics is fixing a regression (caused by a >commit from Steven). Hence it would be good to get this fixed rather sooner >than later. Or is the RFC in the subject the reason why there was no progress? Is >it maybe time to remove it? I made the pull request for this rasdaemon patch here, https://github.com/mchehab/rasdaemon/pull/86 > >> Debugging showed, poll() on trace_pipe_raw in the ras-events.c do not >> return and this issue is seen after the commit >> 42fb0a1e84ff525ebe560e2baf9451ab69127e2b ("tracing/ring-buffer: Have >> polling block on watermark"). >> >> This also verified using a test application for poll() and select() on >> trace_pipe_raw. >> >> There is also a bug reported on this issue, >> https://lore.kernel.org/all/31eb3b12-3350-90a4-a0d9-d1494db7cf74@oracl >> e.com/ > > > > >> This issue occurs for the per_cpu case, which calls the >> ring_buffer_poll_wait(), in kernel/trace/ring_buffer.c, with the >> buffer_percent > 0 and then wait until the percentage of pages are >> available.The default value set for the buffer_percent is 50 in the >> kernel/trace/trace.c. However poll() does not return even met the >> percentage of pages condition. >> >> As a fix, rasdaemon set buffer_percent as 0 through the >> /sys/kernel/debug/tracing/instances/rasdaemon/buffer_percent, then the >> task will wake up as soon as data is added to any of the specific cpu >> buffer and poll() on per_cpu/cpuX/trace_pipe_raw does not block >> indefinitely. >> >> Dependency on the kernel RFC patch >> tracing: Fix poll() and select() do not work on per_cpu trace_pipe and >> trace_pipe_raw > >BTW, this patch afaics should have these tags: > >Fixes: 42fb0a1e84ff ("tracing/ring-buffer: Have polling block on watermark") >Reported-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> >Link: >https://lore.kernel.org/r/31eb3b12-3350-90a4-a0d9- >d1494db7cf74@oracle.com/ Yes. I had given the link in the patch header. > >An likely a > >Cc: <stable@vger.kernel.org> # 6.1.x > >Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) >-- >Everything you wanna know about Linux kernel regression tracking: >https://linux-regtracking.leemhuis.info/about/#tldr >If I did something stupid, please tell me, as explained on that page. > >#regzbot poke >#regzbot ^backmonitor: >https://lore.kernel.org/r/31eb3b12-3350-90a4-a0d9- >d1494db7cf74@oracle.com/ > >> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> >> >> Changes: >> RFC V1 -> RFC V2 >> 1. Rename the patch header subject. >> 2. Changes for the backward compatability to the old kernels. >> --- >> ras-events.c | 22 ++++++++++++++++++++++ >> 1 file changed, 22 insertions(+) >> >> diff --git a/ras-events.c b/ras-events.c index 3691311..e505a0e 100644 >> --- a/ras-events.c >> +++ b/ras-events.c >> @@ -383,6 +383,8 @@ static int read_ras_event_all_cpus(struct pthread_data >*pdata, >> int warnonce[n_cpus]; >> char pipe_raw[PATH_MAX]; >> int legacy_kernel = 0; >> + int fd; >> + char buf[10]; >> #if 0 >> int need_sleep = 0; >> #endif >> @@ -402,6 +404,26 @@ static int read_ras_event_all_cpus(struct >pthread_data *pdata, >> return -ENOMEM; >> } >> >> + /* Fix for poll() on the per_cpu trace_pipe and trace_pipe_raw blocks >> + * indefinitely with the default buffer_percent in the kernel trace >system, >> + * which is introduced by the following change in the kernel. >> + * >https://lore.kernel.org/all/20221020231427.41be3f26@gandalf.local.home/T/#u >. >> + * Set buffer_percent to 0 so that poll() will return immediately >> + * when the trace data is available in the ras per_cpu trace pipe_raw >> + */ >> + fd = open_trace(pdata[0].ras, "buffer_percent", O_WRONLY); >> + if (fd >= 0) { >> + /* For the backward compatabilty to the old kernel, do not >return >> + * if fail to set the buffer_percent. >> + */ >> + snprintf(buf, sizeof(buf), "0"); >> + size = write(fd, buf, strlen(buf)); >> + if (size <= 0) >> + log(TERM, LOG_WARNING, "can't write to >buffer_percent\n"); >> + close(fd); >> + } else >> + log(TERM, LOG_WARNING, "Can't open buffer_percent\n"); >> + >> for (i = 0; i < (n_cpus + 1); i++) >> fds[i].fd = -1; >> Thanks, Shiju
Linux regression tracking (Thorsten Leemhuis)
Feb. 16, 2023, 1:55 p.m. UTC |
#3
Addressed
Unaddressed
On 16.02.23 14:40, Shiju Jose wrote: > Hello, > >> -----Original Message----- >> From: Linux regression tracking (Thorsten Leemhuis) >> <regressions@leemhuis.info> >> Sent: 16 February 2023 11:48 >> To: rostedt@goodmis.org >> Cc: mhiramat@kernel.org; linux-kernel@vger.kernel.org; linux-trace- >> kernel@vger.kernel.org; tanxiaofei <tanxiaofei@huawei.com>; Jonathan >> Cameron <jonathan.cameron@huawei.com>; Linuxarm >> <linuxarm@huawei.com>; Linux kernel regressions list >> <regressions@lists.linux.dev>; Shiju Jose <shiju.jose@huawei.com>; >> mchehab@kernel.org; linux-edac@vger.kernel.org >> Subject: Re: [RFC PATCH V2 1/1] rasdaemon: Fix poll() on per_cpu >> trace_pipe_raw blocks indefinitely >> >> Hi, this is your Linux kernel regression tracker. > > Kernel fix patch for this issue is already in the mainline. Please see the commit > 3e46d910d8acf94e5360126593b68bf4fee4c4a1 > ("tracing: Fix poll() and select() do not work on per_cpu trace_pipe and trace_pipe_raw") Great, thx for letting me know. >> On 04.02.23 20:33, shiju.jose@huawei.com wrote: >>> From: Shiju Jose <shiju.jose@huawei.com> >>> >>> The error events are not received in the rasdaemon since kernel 6.1-rc6. >>> This issue is firstly detected and reported, when testing the CXL >>> error events in the rasdaemon. >> >> Thanks for working on this. This submission looks stalled, unless I missed >> something. This is unfortunate, as this afaics is fixing a regression (caused by a >> commit from Steven). Hence it would be good to get this fixed rather sooner >> than later. Or is the RFC in the subject the reason why there was no progress? Is >> it maybe time to remove it? > > I made the pull request for this rasdaemon patch here, > https://github.com/mchehab/rasdaemon/pull/86 Ha, stupid me, I didn't even notice this thread was about a rasdaemon change (I landed here as the patch description liked to the tracked regression report). Apologies for mixing this up; I deal with a lot of regression reports and try to avoid mistakes like this, but they nevertheless happen. :-/ Ciao, Thorsten
diff --git a/ras-events.c b/ras-events.c index 3691311..e505a0e 100644 --- a/ras-events.c +++ b/ras-events.c @@ -383,6 +383,8 @@ static int read_ras_event_all_cpus(struct pthread_data *pdata, int warnonce[n_cpus]; char pipe_raw[PATH_MAX]; int legacy_kernel = 0; + int fd; + char buf[10]; #if 0 int need_sleep = 0; #endif @@ -402,6 +404,26 @@ static int read_ras_event_all_cpus(struct pthread_data *pdata, return -ENOMEM; } + /* Fix for poll() on the per_cpu trace_pipe and trace_pipe_raw blocks + * indefinitely with the default buffer_percent in the kernel trace system, + * which is introduced by the following change in the kernel. + * https://lore.kernel.org/all/20221020231427.41be3f26@gandalf.local.home/T/#u. + * Set buffer_percent to 0 so that poll() will return immediately + * when the trace data is available in the ras per_cpu trace pipe_raw + */ + fd = open_trace(pdata[0].ras, "buffer_percent", O_WRONLY); + if (fd >= 0) { + /* For the backward compatabilty to the old kernel, do not return + * if fail to set the buffer_percent. + */ + snprintf(buf, sizeof(buf), "0"); + size = write(fd, buf, strlen(buf)); + if (size <= 0) + log(TERM, LOG_WARNING, "can't write to buffer_percent\n"); + close(fd); + } else + log(TERM, LOG_WARNING, "Can't open buffer_percent\n"); + for (i = 0; i < (n_cpus + 1); i++) fds[i].fd = -1;