[v2,1/3] tracing/user_events: Fix incorrect return value for writing operation when events are disabled
Message ID | 20230609030302.1278716-2-sunliming@kylinos.cn |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp680092vqr; Thu, 8 Jun 2023 20:08:09 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7GUY58uVMaZcWbCvA/9MTW9DA1M90efV4i1XEFNg2xLQHOFhiqIKp57G5iW6XwdHd3qz5z X-Received: by 2002:a05:6870:b782:b0:1a0:2fd5:116a with SMTP id ed2-20020a056870b78200b001a02fd5116amr232975oab.16.1686280088813; Thu, 08 Jun 2023 20:08:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686280088; cv=none; d=google.com; s=arc-20160816; b=GtZiETgj/wT2O1uFR3dCsY8PEQF9tb3ASKs+CDHL5nhoY+YDX8lE5dz42rKvDVom/y Xwcrcub9zKtgRntjvXdRjVhAj9/IPlZUSfvt/hsHitqq7wXrVqbHh29Fd87oS5GroZOv 09GjUWqdfnqqhYatsrk6gS7n8V9r8xG+aLIlJXRCk//ssNqo3yTP4jfPaIiIRWsQq0Oo VO2dSiXiOcn2IAf/JlggKkhljRnvKVVu7wHwD26fujqvkWuxgYuGuWeQS8GKwQ9EcT3m v7Z0H3ucqbblpAh2qIHPdY8bv0n7/ZBht9souWau6UaxoY4KzQevwhOuPC/u+EYqxJal AEuA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=PcKARLPblfhsOqgdK39JcUbv8MjDLZ8IvMWHzSBNScQ=; b=ofO6n+hbtmsavNqgoIJp6jCY5kA3bj4FVb1OP+HqYiMiXte70llcUWZlc1gPtFfpxt mFPhuyRvZfzODZ0xqlZ8ZiKor/dJOkl4xVtjUF9TTAmAJ7GP8mJW3SflhiQ4L7RXGdbS cYOesewoM+lBlBmN2hJyRybs0g0N4LoIpIt94vlx+NpsoTOe3I5cErWO1khcmrCpIimP pxuj7bNflZEQaoWc+0aVuuRzoxODYD7yxigr1nDKnaWkZGWmbhepiDBXuzmkqp8DyWEd Qm5Zo5Fxg3uRCbV03D7pCYGn4KR4zGpnaNqdTRKUvLnIQnPtZj6CCLOfmieHKLUD3xQk 72fg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 24-20020a631258000000b00543c8ad57f3si1840610pgs.82.2023.06.08.20.07.54; Thu, 08 Jun 2023 20:08:08 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238021AbjFIDEb (ORCPT <rfc822;literming00@gmail.com> + 99 others); Thu, 8 Jun 2023 23:04:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52376 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237892AbjFIDEW (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 8 Jun 2023 23:04:22 -0400 Received: from mail-pf1-f194.google.com (mail-pf1-f194.google.com [209.85.210.194]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E7B4230E8; Thu, 8 Jun 2023 20:04:20 -0700 (PDT) Received: by mail-pf1-f194.google.com with SMTP id d2e1a72fcca58-653f9c7b3e4so1022860b3a.2; Thu, 08 Jun 2023 20:04:20 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686279860; x=1688871860; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=PcKARLPblfhsOqgdK39JcUbv8MjDLZ8IvMWHzSBNScQ=; b=NmRQRMaxM0+7angDvMYYf6yLkZ4m6S9gu+3FQujEUmAoGSOylV6PKKh4c6Rr5Ez0Ny 1ExgMMZLKcL9CadORIpPXlumnAbIvuv5ONAGDtHzrJLy2zEhypmD1W6TMXKmaZ8ji/fI WWNpC10Qzj9NC9zatuV/mWDG8K1OttCaZNgDtqoIoLg8Vssrvr5/8B4Hq5x4Nalcxjp+ EXbfVTPYw/WguXryW26gKXYIwxsoDYzsx3f2Afxz+w8Scx23YXKi64wkshHXbJrrSaLv HuIjjNBG6vGCUYEN4fCbU8RGrGf3EBH20ytxke/lrGB0UciQ+5l1uCoY44kKutqAr5bD b3ug== X-Gm-Message-State: AC+VfDwK6hQWaxghY80Aexe3BgH55bvVUtiDFrSwSRCfFfW8/tAHBfmC vWJC2jD+LBqr5sfogUVWmw== X-Received: by 2002:a05:6a00:1a4e:b0:662:b5de:27f6 with SMTP id h14-20020a056a001a4e00b00662b5de27f6mr52364pfv.17.1686279860393; Thu, 08 Jun 2023 20:04:20 -0700 (PDT) Received: from localhost.localdomain ([116.128.244.169]) by smtp.gmail.com with ESMTPSA id a16-20020a62bd10000000b0065ecdefa57fsm1713778pff.0.2023.06.08.20.04.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Jun 2023 20:04:20 -0700 (PDT) From: sunliming <sunliming@kylinos.cn> To: mhiramat@kernel.org, beaub@linux.microsoft.com, rostedt@goodmis.org, shuah@kernel.org Cc: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, kelulanainsley@gmail.com, sunliming <sunliming@kylinos.cn> Subject: [PATCH v2 1/3] tracing/user_events: Fix incorrect return value for writing operation when events are disabled Date: Fri, 9 Jun 2023 11:03:00 +0800 Message-Id: <20230609030302.1278716-2-sunliming@kylinos.cn> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230609030302.1278716-1-sunliming@kylinos.cn> References: <20230609030302.1278716-1-sunliming@kylinos.cn> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00, FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768192830303169462?= X-GMAIL-MSGID: =?utf-8?q?1768192830303169462?= |
Series |
tracing/user_events: Fix incorrect return value for
|
|
Commit Message
sunliming
June 9, 2023, 3:03 a.m. UTC
The writing operation return the count of writes whether events are
enabled or disabled. This is incorrect when events are disabled. Fix
this by just return -ENOENT when events are disabled.
Signed-off-by: sunliming <sunliming@kylinos.cn>
---
kernel/trace/trace_events_user.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
Comments
On Fri, Jun 09, 2023 at 11:03:00AM +0800, sunliming wrote: > The writing operation return the count of writes whether events are > enabled or disabled. This is incorrect when events are disabled. Fix > this by just return -ENOENT when events are disabled. > When testing this patch locally I found that we would occasionally get -ENOENT when events were enabled, but then become disabled, since writes do not have any locking around the tracepoint checks for performance reasons. I've asked a few peers of mine their thoughts on this, whether an error should result when there are no enabled events. The consensus I've heard back is that they would not consider this case an actual error, just as writing to /dev/null does not actually return an error. However, if you feel strongly we need this and have a good use case, it seems better to enable this logic behind a flag instead of having it default based on my conversations with others. Thanks, -Beau > Signed-off-by: sunliming <sunliming@kylinos.cn> > --- > kernel/trace/trace_events_user.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_user.c > index 1ac5ba5685ed..92204bbe79da 100644 > --- a/kernel/trace/trace_events_user.c > +++ b/kernel/trace/trace_events_user.c > @@ -1957,7 +1957,8 @@ static ssize_t user_events_write_core(struct file *file, struct iov_iter *i) > > if (unlikely(faulted)) > return -EFAULT; > - } > + } else > + return -ENOENT; > > return ret; > } > -- > 2.25.1
Beau Belgrave <beaub@linux.microsoft.com> 于2023年6月17日周六 00:08写道: > > On Fri, Jun 09, 2023 at 11:03:00AM +0800, sunliming wrote: > > The writing operation return the count of writes whether events are > > enabled or disabled. This is incorrect when events are disabled. Fix > > this by just return -ENOENT when events are disabled. > > > > When testing this patch locally I found that we would occasionally get > -ENOENT when events were enabled, but then become disabled, since writes > do not have any locking around the tracepoint checks for performance > reasons. > > I've asked a few peers of mine their thoughts on this, whether an error > should result when there are no enabled events. The consensus I've heard > back is that they would not consider this case an actual error, just as > writing to /dev/null does not actually return an error. > > However, if you feel strongly we need this and have a good use case, it > seems better to enable this logic behind a flag instead of having it > default based on my conversations with others. > > Thanks, > -Beau There is indeed a problem. Once enabled, perform the write operation immediately. Now,when the event is disabled, the trace record appears to be lost. In some situations where data timing is sensitive, it may cause confusion. In this case, not returning an error (as mentioned in your reply, it is not considered this case an actual error) and returning 0 ( meaning that the number of data to be written is 0) may be a good way to handle it? Thanks, -Sunliming > > > Signed-off-by: sunliming <sunliming@kylinos.cn> > > --- > > kernel/trace/trace_events_user.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_user.c > > index 1ac5ba5685ed..92204bbe79da 100644 > > --- a/kernel/trace/trace_events_user.c > > +++ b/kernel/trace/trace_events_user.c > > @@ -1957,7 +1957,8 @@ static ssize_t user_events_write_core(struct file *file, struct iov_iter *i) > > > > if (unlikely(faulted)) > > return -EFAULT; > > - } > > + } else > > + return -ENOENT; > > > > return ret; > > } > > -- > > 2.25.1
On Mon, Jun 19, 2023 at 04:51:56PM +0800, sunliming wrote: > Beau Belgrave <beaub@linux.microsoft.com> 于2023年6月17日周六 00:08写道: > > > > On Fri, Jun 09, 2023 at 11:03:00AM +0800, sunliming wrote: > > > The writing operation return the count of writes whether events are > > > enabled or disabled. This is incorrect when events are disabled. Fix > > > this by just return -ENOENT when events are disabled. > > > > > > > When testing this patch locally I found that we would occasionally get > > -ENOENT when events were enabled, but then become disabled, since writes > > do not have any locking around the tracepoint checks for performance > > reasons. > > > > I've asked a few peers of mine their thoughts on this, whether an error > > should result when there are no enabled events. The consensus I've heard > > back is that they would not consider this case an actual error, just as > > writing to /dev/null does not actually return an error. > > > > However, if you feel strongly we need this and have a good use case, it > > seems better to enable this logic behind a flag instead of having it > > default based on my conversations with others. > > > > Thanks, > > -Beau > > > > There is indeed a problem. Once enabled, perform the write operation > immediately. > The immediate write does work, and gets put into a buffer. The ftrace and perf self tests do the above case. So, no worries at this point. > Now,when the event is disabled, the trace record appears to be lost. I'm taking this to mean, if in between the time of the bit check and the actual write() /writev() syscall the event becomes disabled, the event won't write to the buffer. Yes, that is expected. > In some situations > where data timing is sensitive, it may cause confusion. In this case, > not returning an > error (as mentioned in your reply, it is not considered this case an > actual error) and > returning 0 ( meaning that the number of data to be written is 0) may > be a good way > to handle it? This is where I get a little lost. What would a user process do with a return of 0 bytes? It shouldn't retry, since it just hit that small timing window. In reality, it just incurred a temporary excessive syscall cost, but no real data loss (the operator/admin turned the event off). I'm missing why you feel it's important the user process know such a window was hit? Can you help me understand that? I do think returning 0 bytes is better than an error here, but I'd really like to know why the user process wants to know at all. Maybe they have user-space only logging and want to be able to mark there if it's in both spots (kernel and user buffers)? Thanks, -Beau > Thanks, > -Sunliming > > > > > > Signed-off-by: sunliming <sunliming@kylinos.cn> > > > --- > > > kernel/trace/trace_events_user.c | 3 ++- > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_user.c > > > index 1ac5ba5685ed..92204bbe79da 100644 > > > --- a/kernel/trace/trace_events_user.c > > > +++ b/kernel/trace/trace_events_user.c > > > @@ -1957,7 +1957,8 @@ static ssize_t user_events_write_core(struct file *file, struct iov_iter *i) > > > > > > if (unlikely(faulted)) > > > return -EFAULT; > > > - } > > > + } else > > > + return -ENOENT; > > > > > > return ret; > > > } > > > -- > > > 2.25.1
Beau Belgrave <beaub@linux.microsoft.com> 于2023年6月20日周二 02:40写道: > > On Mon, Jun 19, 2023 at 04:51:56PM +0800, sunliming wrote: > > Beau Belgrave <beaub@linux.microsoft.com> 于2023年6月17日周六 00:08写道: > > > > > > On Fri, Jun 09, 2023 at 11:03:00AM +0800, sunliming wrote: > > > > The writing operation return the count of writes whether events are > > > > enabled or disabled. This is incorrect when events are disabled. Fix > > > > this by just return -ENOENT when events are disabled. > > > > > > > > > > When testing this patch locally I found that we would occasionally get > > > -ENOENT when events were enabled, but then become disabled, since writes > > > do not have any locking around the tracepoint checks for performance > > > reasons. > > > > > > I've asked a few peers of mine their thoughts on this, whether an error > > > should result when there are no enabled events. The consensus I've heard > > > back is that they would not consider this case an actual error, just as > > > writing to /dev/null does not actually return an error. > > > > > > However, if you feel strongly we need this and have a good use case, it > > > seems better to enable this logic behind a flag instead of having it > > > default based on my conversations with others. > > > > > > Thanks, > > > -Beau > > > > > > > > There is indeed a problem. Once enabled, perform the write operation > > immediately. > > > > The immediate write does work, and gets put into a buffer. The ftrace > and perf self tests do the above case. So, no worries at this point. > > > Now,when the event is disabled, the trace record appears to be lost. > > I'm taking this to mean, if in between the time of the bit check and the > actual write() /writev() syscall the event becomes disabled, the event > won't write to the buffer. Yes, that is expected. > Yes , got it, thank you for your explanation. > > In some situations > > where data timing is sensitive, it may cause confusion. In this case, > > not returning an > > error (as mentioned in your reply, it is not considered this case an > > actual error) and > > returning 0 ( meaning that the number of data to be written is 0) may > > be a good way > > to handle it? > > This is where I get a little lost. What would a user process do with a > return of 0 bytes? It shouldn't retry, since it just hit that small > timing window. In reality, it just incurred a temporary excessive > syscall cost, but no real data loss (the operator/admin turned the event > off). > > I'm missing why you feel it's important the user process know such a > window was hit? > > Can you help me understand that? > I haven't encountered a specific scenario that it's important the user process know such a window was hit. This may be a mistake in my understanding. When someone uses user events checking the output of an event to confirm the execution status of a program, it may cause confusion if someone else prohibits the event. This shouldn't be a serious issue, this patch just makes things look better. Thanks, -Sunliming > I do think returning 0 bytes is better than an error here, but I'd > really like to know why the user process wants to know at all. Maybe > they have user-space only logging and want to be able to mark there if > it's in both spots (kernel and user buffers)? > > Thanks, > -Beau > > > Thanks, > > -Sunliming > > > > > > > > > Signed-off-by: sunliming <sunliming@kylinos.cn> > > > > --- > > > > kernel/trace/trace_events_user.c | 3 ++- > > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_user.c > > > > index 1ac5ba5685ed..92204bbe79da 100644 > > > > --- a/kernel/trace/trace_events_user.c > > > > +++ b/kernel/trace/trace_events_user.c > > > > @@ -1957,7 +1957,8 @@ static ssize_t user_events_write_core(struct file *file, struct iov_iter *i) > > > > > > > > if (unlikely(faulted)) > > > > return -EFAULT; > > > > - } > > > > + } else > > > > + return -ENOENT; > > > > > > > > return ret; > > > > } > > > > -- > > > > 2.25.1
On Mon, 19 Jun 2023 11:40:44 -0700 Beau Belgrave <beaub@linux.microsoft.com> wrote: > > Now,when the event is disabled, the trace record appears to be lost. > > I'm taking this to mean, if in between the time of the bit check and the > actual write() /writev() syscall the event becomes disabled, the event > won't write to the buffer. Yes, that is expected. > > > In some situations > > where data timing is sensitive, it may cause confusion. In this case, > > not returning an > > error (as mentioned in your reply, it is not considered this case an > > actual error) and > > returning 0 ( meaning that the number of data to be written is 0) may > > be a good way > > to handle it? > > This is where I get a little lost. What would a user process do with a > return of 0 bytes? It shouldn't retry, since it just hit that small > timing window. In reality, it just incurred a temporary excessive > syscall cost, but no real data loss (the operator/admin turned the event > off). Note, this is similar to the race in the kernel with several tracing activities. If a disable happens and the buffer is now off, but the trace is still attempted, zero or NULL (depending on the function) is returned. This just means that tracing is off, and the event should just be dropped. -- Steve
diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_user.c index 1ac5ba5685ed..92204bbe79da 100644 --- a/kernel/trace/trace_events_user.c +++ b/kernel/trace/trace_events_user.c @@ -1957,7 +1957,8 @@ static ssize_t user_events_write_core(struct file *file, struct iov_iter *i) if (unlikely(faulted)) return -EFAULT; - } + } else + return -ENOENT; return ret; }