Message ID | 20230726121618.19198-1-zegao@tencent.com |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a985:0:b0:3e4:2afc:c1 with SMTP id t5csp415236vqo; Wed, 26 Jul 2023 06:48:40 -0700 (PDT) X-Google-Smtp-Source: APBJJlGYdLATiM4DOdfrOSETXg23aVs+pqj/3PERV7l/xY1sYaQb4omx6C9cr2DtweuvzV1psxfT X-Received: by 2002:a05:6402:341:b0:51e:2305:931 with SMTP id r1-20020a056402034100b0051e23050931mr1711649edw.22.1690379319963; Wed, 26 Jul 2023 06:48:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690379319; cv=none; d=google.com; s=arc-20160816; b=H1hYh7cQaLG84oS7DSBChxOhsDQofEwKKPbcj9z+NKlEQwSz7LEB/GSP6FGMKYV/Af 76dTPn0sjZVXoZiVopTYga8E8bW2LCbXoaFvxpPEq0CL8mYx8A4O1TuC9gNQ+gOb6udu 6x2U6A2ln/k90DPDkVhtQaerQJwxgjVxU54YXMzvtibpG4iSpGsxl1E6+SEox02jduCf N4MUVRQ38EK9evQFcaPs2UySK2mbkyt1fNzuZdoIbZnf69FQ3uPLy6U+aVeGBsBXeO4A AHwB1JizWfssFEw0EVp/wiL4trAlLkquttvuJMWD021rPd+0/NBhWtr7a0FFXzpybLO3 JoqA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=P/SfUH6qagd9RztYevRFo2Q3j6jWs29w7wHJ0ZcNrWE=; fh=bdj1owIhHCrd5hN/dHRLD7g8I8CgOAzvEwVKQ1Hbjho=; b=Qqso0IbTjlOCRUKoKc1iyc1Y6R10GnxUjwzmuJBeZcNCgvHUFqvkU8DH0fzG0mI/Zv aZe1SekZzd1UI9q6EPucQN2cnyTRT2Iz1iUOjkGpEEwW5zzzrsdkRcm2jdxfV5Mad7GW 8zpub2zJBWIaLng7prsDcHQWUq6+BcvrNi2lS49Smc+GreKG+BiGA51jt3zESAbD1aAM 4og6Bgvd2vzjFegs2pOapDdt/DfvU+IwXzxA0IHHuJEC3MAdy7//6aPg3RPDq7a0p3BF Sm9gDGdtyDrHMhgeFbmxJE87Od7XNT+/cm0dPtLvQHP7/xEXdCVztJQfi/S9RxQY7C7v ogHg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=GmigUXHN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f24-20020aa7d858000000b005225932be75si1353147eds.64.2023.07.26.06.48.15; Wed, 26 Jul 2023 06:48:39 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=GmigUXHN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232300AbjGZMQa (ORCPT <rfc822;kloczko.tomasz@gmail.com> + 99 others); Wed, 26 Jul 2023 08:16:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32782 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229715AbjGZMQ2 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 26 Jul 2023 08:16:28 -0400 Received: from mail-pg1-x52d.google.com (mail-pg1-x52d.google.com [IPv6:2607:f8b0:4864:20::52d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0C4D819A7; Wed, 26 Jul 2023 05:16:28 -0700 (PDT) Received: by mail-pg1-x52d.google.com with SMTP id 41be03b00d2f7-55ae2075990so3578900a12.0; Wed, 26 Jul 2023 05:16:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1690373787; x=1690978587; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=P/SfUH6qagd9RztYevRFo2Q3j6jWs29w7wHJ0ZcNrWE=; b=GmigUXHNKivEF+qJWOpYMSidwpXidWB/BVuh8qbB8lwRsTqUReNT9UUAdrL9YB5pff ASxe6g2NoBplCcNWZUumZT20q+Ood/JVVApgHUyhnz37V7fKpZCDWAMyGQ6xkyspP/Pz yEJw8sCXmFLiSZnQs2uw09KO3AFO5rF33dHntIqg/Ci56bl8fnnbltducGEWhBrMKdnz OvbdgPl/6Jt4P9Ok2arH86Zo3k4a0rY/IJQVvmpB+YY9t6IcFnz4u3oln39B0xIQcLrz rxHx8zAcLMx+GRSP1Gpu2oNzGjkTAUFNRhU8DLDvhHPehy5+ZE6zKkcpEQzdQKJd4yKH 7+MA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690373787; x=1690978587; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=P/SfUH6qagd9RztYevRFo2Q3j6jWs29w7wHJ0ZcNrWE=; b=ByouSNpscg6VNGH4TjPaAnw4pjQkGd4B7U55Vm4GuP230h1x/8VE0YKORo0NjpN+4N dfQmw2njFixVa8NlUVonRqN3flaFw2jtOVEj7MxSGhde6Nr7+XCc8JbFmPwPlOOghQHP f/0KQGoDR5YdRmAwk8bJeMLzn4lckAAM17+kCkuBTWX9l7V8sFJfMxRF17oVe0SHK619 ZHIiJ2Hk7zJgGyaFm5uQSuMU2hpG4bUiUsaW6A1K58+tBCvzjc8DFIQSijwV0MFcza0M quu4Du3iGRzxBrHsmN5N4jxpL+O9AgQ8dYNFcXe08mlaBNjTlxqOMFBPSagjYanov8Ja 7nDQ== X-Gm-Message-State: ABy/qLb5aGfbEiZM+HOXomQ5ghqfvm79RdsaqTsxvI1Nk5c2AS2oeTy1 DnydbJd9GXM6kjY/Kyc5BbY= X-Received: by 2002:a17:90a:408f:b0:263:3386:9da8 with SMTP id l15-20020a17090a408f00b0026333869da8mr1155414pjg.49.1690373787347; Wed, 26 Jul 2023 05:16:27 -0700 (PDT) Received: from localhost.localdomain ([203.205.141.25]) by smtp.googlemail.com with ESMTPSA id ms19-20020a17090b235300b00263f446d432sm1172880pjb.43.2023.07.26.05.16.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 26 Jul 2023 05:16:26 -0700 (PDT) From: Ze Gao <zegao2021@gmail.com> X-Google-Original-From: Ze Gao <zegao@tencent.com> To: Peter Zijlstra <peterz@infradead.org>, Steven Rostedt <rostedt@goodmis.org>, Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com>, Alexander Shishkin <alexander.shishkin@linux.intel.com>, Arnaldo Carvalho de Melo <acme@kernel.org>, Ian Rogers <irogers@google.com>, Ingo Molnar <mingo@redhat.com>, Jiri Olsa <jolsa@kernel.org>, Mark Rutland <mark.rutland@arm.com>, Masami Hiramatsu <mhiramat@kernel.org>, linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-trace-devel@vger.kernel.org, Ze Gao <zegao@tencent.com> Subject: [RFC PATCH v2 0/3] report task state in symbolic chars from sched tracepoint Date: Wed, 26 Jul 2023 20:16:15 +0800 Message-Id: <20230726121618.19198-1-zegao@tencent.com> X-Mailer: git-send-email 2.40.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM,RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1772491185709921604 X-GMAIL-MSGID: 1772491185709921604 |
Series |
report task state in symbolic chars from sched tracepoint
|
|
Message
Ze Gao
July 26, 2023, 12:16 p.m. UTC
This is the 2nd attempt to fix the report task state issue in sched tracepint, here is the first version: https://lore.kernel.org/linux-trace-kernel/20230725072254.32045-1-zegao@tencent.com Against v1, add a new var to report task state in symbolic char instead of replacing the old one and to not to break anything. -- In the status quo, we should see three different outcomes of the reported sched-out task state from perf-script, perf-sched-timehist, and Tp_printk of tracepoint sched_switch. And it's not hard to figure out that the former two are built upon the third one, and the reason why we see this inconsistency is that the former two does not catch up with the internal change of reported task state definitions as the kernel evolves. IMHO, exporting internal representations of task state in the tracepoint sched_switch is not a good practice and not encouraged at all, which can easily break userspace tools that relies on it. Especially when tracepoints are massively used in many observability tools nowadays due to its stable nature, which makes them no longer used for debug only purpose and we should be careful to decide what ought to be reported to userspace and what ought not. Therefore, to fix the issues mentioned above for good, instead of choosing I proposed to add a new variable to report task state in sched_switch with a symbolic character along with the old hardcoded value, and save the further processing of userspace tools and spare them from knowing implementation details in the kernel. After this patch seires, we report 'RSDTtXZPI' the same as in procfs, plus a 'p' which denotes PREEMP_ACTIVE and is used for sched_switch tracepoint only. Reviews welcome! Regards, Ze Ze Gao (2): sched, tracing: add to report task state in symbolic chars perf sched: use the new prev_state_char instead in tracepoint sched_switch include/trace/events/sched.h | 60 +++++++++++++++++++++--------------- tools/perf/builtin-sched.c | 57 ++++++---------------------------- 2 files changed, 45 insertions(+), 72 deletions(-) Ze Gao (1): libtraceevent: use the new prev_state_char instead in tracepoint sched_switch plugins/plugin_sched_switch.c | 29 ++++------------------------- 1 file changed, 4 insertions(+), 25 deletions(-)
Comments
On Wed, Jul 26, 2023 at 5:16 AM Ze Gao <zegao2021@gmail.com> wrote: > > > This is the 2nd attempt to fix the report task state issue in sched > tracepint, here is the first version: > > https://lore.kernel.org/linux-trace-kernel/20230725072254.32045-1-zegao@tencent.com > > Against v1, add a new var to report task state in symbolic char instead > of replacing the old one and to not to break anything. > > -- > > In the status quo, we should see three different outcomes of the reported > sched-out task state from perf-script, perf-sched-timehist, and Tp_printk > of tracepoint sched_switch. And it's not hard to figure out that the > former two are built upon the third one, and the reason why we see this > inconsistency is that the former two does not catch up with the internal > change of reported task state definitions as the kernel evolves. > > IMHO, exporting internal representations of task state in the tracepoint > sched_switch is not a good practice and not encouraged at all, which can > easily break userspace tools that relies on it. Especially when tracepoints > are massively used in many observability tools nowadays due to its stable > nature, which makes them no longer used for debug only purpose and we > should be careful to decide what ought to be reported to userspace and what > ought not. > > Therefore, to fix the issues mentioned above for good, instead of choosing > I proposed to add a new variable to report task state in sched_switch with > a symbolic character along with the old hardcoded value, and save the > further processing of userspace tools and spare them from knowing > implementation details in the kernel. > > After this patch seires, we report 'RSDTtXZPI' the same as in procfs, plus > a 'p' which denotes PREEMP_ACTIVE and is used for sched_switch tracepoint only. > > Reviews welcome! Thanks Ze, I think this is worthwhile cleanup and makes the code overall simpler. I don't know if others have strong opinions, I don't often work in this code, but I think the patches are worth landing this. Acked-by: Ian Rogers <irogers@google.com> Thanks, Ian > Regards, > > Ze > > Ze Gao (2): > sched, tracing: add to report task state in symbolic chars > perf sched: use the new prev_state_char instead in tracepoint > sched_switch > > include/trace/events/sched.h | 60 +++++++++++++++++++++--------------- > tools/perf/builtin-sched.c | 57 ++++++---------------------------- > 2 files changed, 45 insertions(+), 72 deletions(-) > > Ze Gao (1): > libtraceevent: use the new prev_state_char instead in tracepoint > sched_switch > > plugins/plugin_sched_switch.c | 29 ++++------------------------- > 1 file changed, 4 insertions(+), 25 deletions(-) > > -- > 2.40.1 >
Thanks Ian, In regard to ABI, symbolic chars are much more stable and I think we can benefit from this in the long run. Regards, Ze On Sat, Jul 29, 2023 at 1:29 AM Ian Rogers <irogers@google.com> wrote: > > On Wed, Jul 26, 2023 at 5:16 AM Ze Gao <zegao2021@gmail.com> wrote: > > > > > > This is the 2nd attempt to fix the report task state issue in sched > > tracepint, here is the first version: > > > > https://lore.kernel.org/linux-trace-kernel/20230725072254.32045-1-zegao@tencent.com > > > > Against v1, add a new var to report task state in symbolic char instead > > of replacing the old one and to not to break anything. > > > > -- > > > > In the status quo, we should see three different outcomes of the reported > > sched-out task state from perf-script, perf-sched-timehist, and Tp_printk > > of tracepoint sched_switch. And it's not hard to figure out that the > > former two are built upon the third one, and the reason why we see this > > inconsistency is that the former two does not catch up with the internal > > change of reported task state definitions as the kernel evolves. > > > > IMHO, exporting internal representations of task state in the tracepoint > > sched_switch is not a good practice and not encouraged at all, which can > > easily break userspace tools that relies on it. Especially when tracepoints > > are massively used in many observability tools nowadays due to its stable > > nature, which makes them no longer used for debug only purpose and we > > should be careful to decide what ought to be reported to userspace and what > > ought not. > > > > Therefore, to fix the issues mentioned above for good, instead of choosing > > I proposed to add a new variable to report task state in sched_switch with > > a symbolic character along with the old hardcoded value, and save the > > further processing of userspace tools and spare them from knowing > > implementation details in the kernel. > > > > After this patch seires, we report 'RSDTtXZPI' the same as in procfs, plus > > a 'p' which denotes PREEMP_ACTIVE and is used for sched_switch tracepoint only. > > > > Reviews welcome! > > Thanks Ze, > > I think this is worthwhile cleanup and makes the code overall simpler. > I don't know if others have strong opinions, I don't often work in > this code, but I think the patches are worth landing this. > > Acked-by: Ian Rogers <irogers@google.com> > > Thanks, > Ian > > > Regards, > > > > Ze > > > > Ze Gao (2): > > sched, tracing: add to report task state in symbolic chars > > perf sched: use the new prev_state_char instead in tracepoint > > sched_switch > > > > include/trace/events/sched.h | 60 +++++++++++++++++++++--------------- > > tools/perf/builtin-sched.c | 57 ++++++---------------------------- > > 2 files changed, 45 insertions(+), 72 deletions(-) > > > > Ze Gao (1): > > libtraceevent: use the new prev_state_char instead in tracepoint > > sched_switch > > > > plugins/plugin_sched_switch.c | 29 ++++------------------------- > > 1 file changed, 4 insertions(+), 25 deletions(-) > > > > -- > > 2.40.1 > >
On Wed, 26 Jul 2023 20:16:15 +0800 Ze Gao <zegao2021@gmail.com> wrote: > > This is the 2nd attempt to fix the report task state issue in sched > tracepint, here is the first version: > > https://lore.kernel.org/linux-trace-kernel/20230725072254.32045-1-zegao@tencent.com > > Against v1, add a new var to report task state in symbolic char instead > of replacing the old one and to not to break anything. > > -- > > In the status quo, we should see three different outcomes of the reported > sched-out task state from perf-script, perf-sched-timehist, and Tp_printk > of tracepoint sched_switch. And it's not hard to figure out that the > former two are built upon the third one, and the reason why we see this > inconsistency is that the former two does not catch up with the internal > change of reported task state definitions as the kernel evolves. > > IMHO, exporting internal representations of task state in the tracepoint > sched_switch is not a good practice and not encouraged at all, which can > easily break userspace tools that relies on it. Especially when tracepoints > are massively used in many observability tools nowadays due to its stable > nature, which makes them no longer used for debug only purpose and we > should be careful to decide what ought to be reported to userspace and what > ought not. > > Therefore, to fix the issues mentioned above for good, instead of choosing > I proposed to add a new variable to report task state in sched_switch with > a symbolic character along with the old hardcoded value, and save the > further processing of userspace tools and spare them from knowing > implementation details in the kernel. > > After this patch seires, we report 'RSDTtXZPI' the same as in procfs, plus > a 'p' which denotes PREEMP_ACTIVE and is used for sched_switch tracepoint only. This series looks good to me. Putting the flag in the trace record is a good idea :) Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> for this series. Thank you, > > Reviews welcome! > > Regards, > > Ze > > Ze Gao (2): > sched, tracing: add to report task state in symbolic chars > perf sched: use the new prev_state_char instead in tracepoint > sched_switch > > include/trace/events/sched.h | 60 +++++++++++++++++++++--------------- > tools/perf/builtin-sched.c | 57 ++++++---------------------------- > 2 files changed, 45 insertions(+), 72 deletions(-) > > Ze Gao (1): > libtraceevent: use the new prev_state_char instead in tracepoint > sched_switch > > plugins/plugin_sched_switch.c | 29 ++++------------------------- > 1 file changed, 4 insertions(+), 25 deletions(-) > > -- > 2.40.1 >
Hi Masami, Thanks for your review! Sincerely, Ze On Mon, Jul 31, 2023 at 11:53 AM Masami Hiramatsu <mhiramat@kernel.org> wrote: > > On Wed, 26 Jul 2023 20:16:15 +0800 > Ze Gao <zegao2021@gmail.com> wrote: > > > > > This is the 2nd attempt to fix the report task state issue in sched > > tracepint, here is the first version: > > > > https://lore.kernel.org/linux-trace-kernel/20230725072254.32045-1-zegao@tencent.com > > > > Against v1, add a new var to report task state in symbolic char instead > > of replacing the old one and to not to break anything. > > > > -- > > > > In the status quo, we should see three different outcomes of the reported > > sched-out task state from perf-script, perf-sched-timehist, and Tp_printk > > of tracepoint sched_switch. And it's not hard to figure out that the > > former two are built upon the third one, and the reason why we see this > > inconsistency is that the former two does not catch up with the internal > > change of reported task state definitions as the kernel evolves. > > > > IMHO, exporting internal representations of task state in the tracepoint > > sched_switch is not a good practice and not encouraged at all, which can > > easily break userspace tools that relies on it. Especially when tracepoints > > are massively used in many observability tools nowadays due to its stable > > nature, which makes them no longer used for debug only purpose and we > > should be careful to decide what ought to be reported to userspace and what > > ought not. > > > > Therefore, to fix the issues mentioned above for good, instead of choosing > > I proposed to add a new variable to report task state in sched_switch with > > a symbolic character along with the old hardcoded value, and save the > > further processing of userspace tools and spare them from knowing > > implementation details in the kernel. > > > > After this patch seires, we report 'RSDTtXZPI' the same as in procfs, plus > > a 'p' which denotes PREEMP_ACTIVE and is used for sched_switch tracepoint only. > > This series looks good to me. Putting the flag in the trace record is > a good idea :) > > Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> > > for this series. > > Thank you, > > > > > Reviews welcome! > > > > Regards, > > > > Ze > > > > Ze Gao (2): > > sched, tracing: add to report task state in symbolic chars > > perf sched: use the new prev_state_char instead in tracepoint > > sched_switch > > > > include/trace/events/sched.h | 60 +++++++++++++++++++++--------------- > > tools/perf/builtin-sched.c | 57 ++++++---------------------------- > > 2 files changed, 45 insertions(+), 72 deletions(-) > > > > Ze Gao (1): > > libtraceevent: use the new prev_state_char instead in tracepoint > > sched_switch > > > > plugins/plugin_sched_switch.c | 29 ++++------------------------- > > 1 file changed, 4 insertions(+), 25 deletions(-) > > > > -- > > 2.40.1 > > > > > -- > Masami Hiramatsu (Google) <mhiramat@kernel.org>