Message ID | 20240205065340.2848065-1-svens@linux.ibm.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-52095-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:168b:b0:106:860b:bbdd with SMTP id ma11csp705868dyb; Sun, 4 Feb 2024 22:54:20 -0800 (PST) X-Google-Smtp-Source: AGHT+IFAfTL7nwQ5nobFLrNdnU0xFXzlCJPR7Jfcx3HsN6tqtCfltEWbookDnevOqkefUYjF8sg/ X-Received: by 2002:a05:620a:1a22:b0:783:9072:9b46 with SMTP id bk34-20020a05620a1a2200b0078390729b46mr9380556qkb.20.1707116060019; Sun, 04 Feb 2024 22:54:20 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707116060; cv=pass; d=google.com; s=arc-20160816; b=FdRwvWDt1+r7sIFNFM7OWTx4z0ocZMFRy1wKq8JFsLutn3qV6mD3s7JZHIENh4dzMn dNlPwtLNY+zP8rD909feKgFONNNrbQG0q1vQ0QsOUFJsxLQtJ3xdBlenVb0WM7a/rthx 0cSqel3DU1buSVjtODBWST2ZSMGcqro53yv2FemXGIC5bM4z3qyr8WkVmGT5SMMZXD9M HM6vA1uGwDPiDgLCmClceEmI1S4nryte16LrN99In26Wp7Dqz/vgrsgtIptKXX8dIUQb h/nPatBYppVOvQdXxlkpeJeyVtCtLl0ojk5iMzRN1KbXUC0y9Cy97TLlF6eb0oDMLJg3 FwLg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:subject:cc:to :from:dkim-signature; bh=dXlQaVF2Q56S/IZ+aTvQ4JSRtoaUwWTdK3pOfNh1CcA=; fh=AhZnL1MfrR43buXp5JZhowBuHP3724t2QLzLW9+OUD8=; b=DPNGx6yMVG0fto+0cGtCmhGBghVJhGvxPVv/Dq1KnX6jD0b87VZMLtbgtmmltX7UTc 9xyQH1yaRLIJd7bmWn7Rd916F3fAH3IVZoPYMPWZ3e99+ogMTn22CZgMOsv3nSiEvhug 4tkXs72MdaHJ3xILAJIjXfsP9z/yjNQj+KPqI6r7gHQAwXUop9Is4TCDZWwuLd1jMX2P s27u6skW8CG4ttX54ashFMDo5pAIkKiX002qwrC7Va6deOvOjXcumrBeqNfzu/UAdXdK stO3ibVf5CDYfAurpRXGzmCFmaAed1SWPxGkAMzQKEjyJF67qgxdpkZ0v7dj2wer2b1T WZeg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=epNqWGkQ; arc=pass (i=1 spf=pass spfdomain=linux.ibm.com dkim=pass dkdomain=ibm.com dmarc=pass fromdomain=linux.ibm.com); spf=pass (google.com: domain of linux-kernel+bounces-52095-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-52095-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com X-Forwarded-Encrypted: i=1; AJvYcCUWLPSjJK2mtzTd4ljupl66LhD6eoomqHNKwquzU8mSaAKP3htaTOFuf5Hv0QgelDtVhM6FXHVqGzJ6yJLKijO+3O1/CA== Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id m17-20020a05620a291100b007856c1a29d1si4547818qkp.242.2024.02.04.22.54.19 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 04 Feb 2024 22:54:20 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-52095-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=epNqWGkQ; arc=pass (i=1 spf=pass spfdomain=linux.ibm.com dkim=pass dkdomain=ibm.com dmarc=pass fromdomain=linux.ibm.com); spf=pass (google.com: domain of linux-kernel+bounces-52095-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-52095-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id CA1F91C22C8D for <ouuuleilei@gmail.com>; Mon, 5 Feb 2024 06:54:19 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 7923010799; Mon, 5 Feb 2024 06:54:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="epNqWGkQ" Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3453010A09; Mon, 5 Feb 2024 06:54:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707116042; cv=none; b=fd/Yol+rsfPhhE5VmffQ0VWg5sth6XvaWPqzyEP2Y2CbboOyJZkR4qA5gtbZbClgNGwDhHIlnLsqL2eEKCsl4wdWZPLX/UI8qhQV4bbKmNzGggatnrZd8JfQ9e5xSl/wbEnWyar2qDeHyDyLhPTRzhhPYiYuKpCED9rEKUf3woA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707116042; c=relaxed/simple; bh=ojRJBvyE1YzDmUrEru2rDr5tb+rS3GNKhG6Ej3QuQEE=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version; b=GdRBKPcSDJ1HVNrG9lbL+K1ckoc9N37MZdbVGMqM8azPC8tc8oOqSxzLUAnms3Zi/1RdXF9NrdrBNyl3n3nS2AFRc3OGWAuM29OHCRmZv1urITwriJSIRX/vdJY6uoH7yjS7eADw1l4SXMeZF+BydziMjuBpj8f1ISuGC3LcDAI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=epNqWGkQ; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0360083.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 4156W7Wo013592; Mon, 5 Feb 2024 06:53:55 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : mime-version : content-transfer-encoding; s=pp1; bh=dXlQaVF2Q56S/IZ+aTvQ4JSRtoaUwWTdK3pOfNh1CcA=; b=epNqWGkQPGB8Il+0/kMWRpe8rw3TUlgLv+VxVLE8YV7L5Y/FkVdkfEw/uriQyu8VA5Re MlbCN6T0GYRqA7CXC4co1qUQ4ZtXh+9PSnjXxqOL2NRpCsvc2eoNTL3lYgYmh+6VpPdD EnmZzxKTKfeM+hTbKQxcNUz5IURGrLmo3dGxvtlpZS4XeJyq9S1t7DLVxQnGX6lrOY7k XOkjBwDe7EYEqILe3RWJ5KoH2CawFjmCN31rEeEhto2LXJ7/KqohHFLW5++AYg/Esgx/ Fp97XDUo4ZiKTKSTgar70eY6k+d6jdaWNnzUyZY2cnysiF4AEq5mpmVcnYWit1gIg47p /g== Received: from ppma23.wdc07v.mail.ibm.com (5d.69.3da9.ip4.static.sl-reverse.com [169.61.105.93]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3w2tg6ref5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 05 Feb 2024 06:53:54 +0000 Received: from pps.filterd (ppma23.wdc07v.mail.ibm.com [127.0.0.1]) by ppma23.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 4154WRJX005458; Mon, 5 Feb 2024 06:53:53 GMT Received: from smtprelay04.fra02v.mail.ibm.com ([9.218.2.228]) by ppma23.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3w21ak6qp1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 05 Feb 2024 06:53:53 +0000 Received: from smtpav05.fra02v.mail.ibm.com (smtpav05.fra02v.mail.ibm.com [10.20.54.104]) by smtprelay04.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 4156rpjL38339154 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 5 Feb 2024 06:53:51 GMT Received: from smtpav05.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 54DD420043; Mon, 5 Feb 2024 06:53:51 +0000 (GMT) Received: from smtpav05.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 44A2720040; Mon, 5 Feb 2024 06:53:51 +0000 (GMT) Received: from tuxmaker.boeblingen.de.ibm.com (unknown [9.152.85.9]) by smtpav05.fra02v.mail.ibm.com (Postfix) with ESMTPS; Mon, 5 Feb 2024 06:53:51 +0000 (GMT) Received: by tuxmaker.boeblingen.de.ibm.com (Postfix, from userid 55390) id 0D686E050E; Mon, 5 Feb 2024 07:53:51 +0100 (CET) From: Sven Schnelle <svens@linux.ibm.com> To: Steven Rostedt <rostedt@goodmis.org>, Masami Hiramatsu <mhiramat@kernel.org>, Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, Mete Durlu <meted@linux.ibm.com> Subject: [PATCH] tracing: use ring_buffer_record_is_set_on() in tracer_tracing_is_on() Date: Mon, 5 Feb 2024 07:53:40 +0100 Message-Id: <20240205065340.2848065-1-svens@linux.ibm.com> X-Mailer: git-send-email 2.40.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: zZuRWJ8066-h3O52CdFArny3BaMnQhx- X-Proofpoint-ORIG-GUID: zZuRWJ8066-h3O52CdFArny3BaMnQhx- X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-02-05_02,2024-01-31_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 mlxscore=0 suspectscore=0 adultscore=0 spamscore=0 malwarescore=0 mlxlogscore=921 phishscore=0 priorityscore=1501 impostorscore=0 lowpriorityscore=0 clxscore=1011 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311290000 definitions=main-2402050051 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1790040929624374535 X-GMAIL-MSGID: 1790040929624374535 |
Series |
tracing: use ring_buffer_record_is_set_on() in tracer_tracing_is_on()
|
|
Commit Message
Sven Schnelle
Feb. 5, 2024, 6:53 a.m. UTC
tracer_tracing_is_on() checks whether record_disabled is not zero. This
checks both the record_disabled counter and the RB_BUFFER_OFF flag.
Reading the source it looks like this function should only check for
the RB_BUFFER_OFF flag. Therefore use ring_buffer_record_is_set_on().
This fixes spurious fails in the 'test for function traceon/off triggers'
test from the ftrace testsuite when the system is under load.
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
---
kernel/trace/trace.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Comments
Hi Steven, Steven Rostedt <rostedt@goodmis.org> writes: > On Mon, 5 Feb 2024 07:53:40 +0100 > Sven Schnelle <svens@linux.ibm.com> wrote: > >> tracer_tracing_is_on() checks whether record_disabled is not zero. This >> checks both the record_disabled counter and the RB_BUFFER_OFF flag. >> Reading the source it looks like this function should only check for >> the RB_BUFFER_OFF flag. Therefore use ring_buffer_record_is_set_on(). >> This fixes spurious fails in the 'test for function traceon/off triggers' >> test from the ftrace testsuite when the system is under load. >> > > I've seen these spurious failures too, but haven't looked deeper into > it. Thanks, Another issue i'm hitting sometimes is this part: csum1=`md5sum trace` sleep $SLEEP_TIME csum2=`md5sum trace` if [ "$csum1" != "$csum2" ]; then fail "Tracing file is still changing" fi This is because the command line was replaced in the saved_cmdlines_buffer, an example diff between both files is: ftracetest-17950 [005] ..... 344507.002490: sched_process_wait: comm=ftracetest pid=0 prio=120 ftracetest-17950 [005] ..... 344507.002492: sched_process_wait: comm=ftracetest pid=0 prio=120 - stress-ng-fanot-17820 [006] d.h.. 344507.009901: sched_stat_runtime: comm=stress-ng-fanot pid=17820 runtime=10000054 [ns] + <...>-17820 [006] d.h.. 344507.009901: sched_stat_runtime: comm=stress-ng-fanot pid=17820 runtime=10000054 [ns] ftracetest-17950 [005] d.h.. 344507.009901: sched_stat_runtime: comm=ftracetest pid=17950 runtime=7417915 [ns] stress-ng-fanot-17819 [003] d.h.. 344507.009901: sched_stat_runtime: comm=stress-ng-fanot pid=17819 runtime=9983473 [ns] - stress-ng-fanot-17820 [007] d.h.. 344507.079900: sched_stat_runtime: comm=stress-ng-fanot pid=17820 runtime=9999865 [ns] + <...>-17820 [007] d.h.. 344507.079900: sched_stat_runtime: comm=stress-ng-fanot pid=17820 runtime=9999865 [ns] stress-ng-fanot-17819 [004] d.h.. 344507.079900: sched_stat_runtime: comm=stress-ng-fanot pid=17819 runtime=8388039 [ns] This can be improved by: echo 32768 > /sys/kernel/tracing/saved_cmdlines_size But this is of course not a fix - should we maybe replace the program name with <...> before comparing, remove the check completely, or do anything else? What do you think? Thanks, Sven
Hi Steven, Steven Rostedt <rostedt@goodmis.org> writes: > On Mon, 05 Feb 2024 14:16:30 +0100 > Sven Schnelle <svens@linux.ibm.com> wrote: >> >> Another issue i'm hitting sometimes is this part: >> >> csum1=`md5sum trace` >> sleep $SLEEP_TIME >> csum2=`md5sum trace` >> >> if [ "$csum1" != "$csum2" ]; then >> fail "Tracing file is still changing" >> fi >> >> This is because the command line was replaced in the >> saved_cmdlines_buffer, an example diff between both files >> is: > > [..] > >> >> This can be improved by: >> >> echo 32768 > /sys/kernel/tracing/saved_cmdlines_size >> >> But this is of course not a fix - should we maybe replace the program >> name with <...> before comparing, remove the check completely, or do >> anything else? What do you think? > > Hmm, actually I would say that this exposes a real bug. Not a major > one, but one that I find annoying. The saved commandlines should only > be updated when a trace event occurs. But really, it should only be > updated if one is added to the ring buffer. If the ring buffer isn't > being updated, we shouldn't be adding new command lines. > > There may be a location that has tracing off but still updating the > cmdlines which will break the saved cache. Ok, my understanding is that it will override the entry in the list if another process comes up with the same PID. But i haven't read the code carefully - let me do that now.
Steven Rostedt <rostedt@goodmis.org> writes: > On Mon, 05 Feb 2024 14:16:30 +0100 > Sven Schnelle <svens@linux.ibm.com> wrote: >> >> Another issue i'm hitting sometimes is this part: >> >> csum1=`md5sum trace` >> sleep $SLEEP_TIME >> csum2=`md5sum trace` >> >> if [ "$csum1" != "$csum2" ]; then >> fail "Tracing file is still changing" >> fi >> >> This is because the command line was replaced in the >> saved_cmdlines_buffer, an example diff between both files >> is: > > [..] > >> >> This can be improved by: >> >> echo 32768 > /sys/kernel/tracing/saved_cmdlines_size >> >> But this is of course not a fix - should we maybe replace the program >> name with <...> before comparing, remove the check completely, or do >> anything else? What do you think? > > Hmm, actually I would say that this exposes a real bug. Not a major > one, but one that I find annoying. The saved commandlines should only > be updated when a trace event occurs. But really, it should only be > updated if one is added to the ring buffer. If the ring buffer isn't > being updated, we shouldn't be adding new command lines. > > There may be a location that has tracing off but still updating the > cmdlines which will break the saved cache. Looking at trace_save_cmdline(): tpid = tsk->pid & (PID_MAX_DEFAULT - 1); where PID_MAX_DEFAULT = 0x8000 so this is basically tpid = tsk->pid & 0x7fff; further on: // might clash with other pid if (otherpid & 0x7fff) == (tsk->pid & 0x7fff) idx = savedcmd->map_pid_to_cmdline[tpid]; if (idx == NO_CMDLINE_MAP) { // This will pick an existing entry if there are // more than cmdline_num entries present idx = (savedcmd->cmdline_idx + 1) % savedcmd->cmdline_num; savedcmd->map_pid_to_cmdline[tpid] = idx; savedcmd->cmdline_idx = idx; } So i think the problem that sometimes '<...>' instead of the correct comm is logged is just expected behaviour given the code above.
On 2/5/24 07:53, Sven Schnelle wrote: > tracer_tracing_is_on() checks whether record_disabled is not zero. This > checks both the record_disabled counter and the RB_BUFFER_OFF flag. > Reading the source it looks like this function should only check for > the RB_BUFFER_OFF flag. Therefore use ring_buffer_record_is_set_on(). > This fixes spurious fails in the 'test for function traceon/off triggers' > test from the ftrace testsuite when the system is under load. > > Signed-off-by: Sven Schnelle <svens@linux.ibm.com> > --- > kernel/trace/trace.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c > index 2a7c6fd934e9..47e221e1e720 100644 Tested-By: Mete Durlu <meted@linux.ibm.com>
Sven Schnelle <svens@linux.ibm.com> writes: > Looking at trace_save_cmdline(): > > tpid = tsk->pid & (PID_MAX_DEFAULT - 1); where PID_MAX_DEFAULT = 0x8000 > > so this is basically > > tpid = tsk->pid & 0x7fff; > > further on: > > // might clash with other pid if (otherpid & 0x7fff) == (tsk->pid & 0x7fff) > idx = savedcmd->map_pid_to_cmdline[tpid]; > if (idx == NO_CMDLINE_MAP) { > // This will pick an existing entry if there are > // more than cmdline_num entries present > idx = (savedcmd->cmdline_idx + 1) % savedcmd->cmdline_num; > savedcmd->map_pid_to_cmdline[tpid] = idx; > savedcmd->cmdline_idx = idx; > } > > So i think the problem that sometimes '<...>' instead of the correct > comm is logged is just expected behaviour given the code above. I added some logging, and the test is not triggering this issue. So i assume the default of 128 cmdline entries is just to small. Sorry for the noise. Lets see whether we're still triggering some failures with the other fix applied in CI. If we do, maybe we want to increase the saved_cmdline_size for the ftrace test suite.
Hi Steven, Steven Rostedt <rostedt@goodmis.org> writes: > On Tue, 06 Feb 2024 09:48:16 +0100 > Sven Schnelle <svens@linux.ibm.com> wrote: > >> I added some logging, and the test is not triggering this issue. So i >> assume the default of 128 cmdline entries is just to small. Sorry for >> the noise. Lets see whether we're still triggering some failures with >> the other fix applied in CI. If we do, maybe we want to increase the >> saved_cmdline_size for the ftrace test suite. > > I wonder if it is a good idea to increase the size when tracing starts, > like the ring buffer expanding code. Maybe to 1024? Or is that still > too small? Not sure whether that is enough, have to test. However, it's not really a fix, it would just require a bit more load and the test would fail again. The fundamental problem is that even after disabling tracing there might be some tracing line added due to the lockless nature of the ringbuffer. This might then replace some existing cmdline entry. So maybe we should change the test to ignore the program name when calculating the checksums.
On 2/7/24 12:09, Steven Rostedt wrote: > On Wed, 07 Feb 2024 06:50:59 +0100 > Sven Schnelle <svens@linux.ibm.com> wrote: > >> Hi Steven, > >> Not sure whether that is enough, have to test. However, it's not really >> a fix, it would just require a bit more load and the test would fail >> again. The fundamental problem is that even after disabling tracing >> there might be some tracing line added due to the lockless nature of >> the ringbuffer. This might then replace some existing cmdline entry. >> So maybe we should change the test to ignore the program name when >> calculating the checksums. > > Have you figured out what caused the cmdlines to change when tracing is > off. It shouldn't be updated even with just 128 entries. > > I'm also looking into a way to keep more of a LRU command lines around, > but nothing concrete yet. > > -- Steve Hi, wouldn't the following scenario explain the behavior we are seeing. When using event triggers, trace uses lockless ringbuffer control paths. If cmdline update and trace output reading is happening on different cpus, the ordering can get messed up. 1. event happens and trace trigger tells ring buffer to stop writes 2. (on cpu#1)test calculates checksum on current state of trace output. 3. (on cpu#2)not knowing about the trace buffers status yet, writer adds a one last entry which would collide with a pid in cmdline map before actually stopping. While (on cpu#1) checksum is being calculated, new saved cmdlines entry is waiting for spinlocks to be unlocked and then gets added. 4. test calculates checksum again and finds that the trace output has changed. <...> is put on collided pid.
Steven Rostedt <rostedt@goodmis.org> writes: > On Wed, 7 Feb 2024 13:07:36 +0100 > Mete Durlu <meted@linux.ibm.com> wrote: > >> wouldn't the following scenario explain the behavior we are seeing. >> When using event triggers, trace uses lockless ringbuffer control paths. >> If cmdline update and trace output reading is happening on different >> cpus, the ordering can get messed up. >> >> 1. event happens and trace trigger tells ring buffer to stop writes >> 2. (on cpu#1)test calculates checksum on current state of trace >> output. >> 3. (on cpu#2)not knowing about the trace buffers status yet, writer adds >> a one last entry which would collide with a pid in cmdline map before >> actually stopping. While (on cpu#1) checksum is being calculated, new >> saved cmdlines entry is waiting for spinlocks to be unlocked and then >> gets added. >> 4. test calculates checksum again and finds that the trace output has >> changed. <...> is put on collided pid. > > But the failure is here: > > on=`cat tracing_on` > if [ $on != "0" ]; then > fail "Tracing is not off" > fi It might be misleading because we're discussing two issues in one thread. The failure above was one problem, which the initial fix is for. The other issue we're still seeing is the test below: > csum1=`md5sum trace` > sleep $SLEEP_TIME > csum2=`md5sum trace` > > if [ "$csum1" != "$csum2" ]; then > fail "Tracing file is still changing" > fi > > 1. tracing is off > 2. do checksum of trace > 3. sleep > 4. do another checksum of trace > 5. compare the two checksums > > Now how did they come up differently in that amount of time? The > saved_cmdlines really should not have been updated. My assumption without reading the code is that something like this happens: CPU0 CPU1 [ringbuffer enabled] ring_buffer_write() if (atomic_read(&buffer->record_disabled)) goto out; echo 0 > tracing_on record_disabled |= RB_BUFFER_OFF csum1=`md5sum trace` [adds trace entry to ring buffer, overwriting savedcmd_lines entry because it thinks ring buffer is enabled] csum2=`md5sum trace`
On 2/7/24 16:47, Steven Rostedt wrote: > On Wed, 07 Feb 2024 14:33:21 +0100 > Sven Schnelle <svens@linux.ibm.com> wrote: > >> My assumption without reading the code is that something like this >> happens: >> >> CPU0 CPU1 >> [ringbuffer enabled] >> ring_buffer_write() >> if (atomic_read(&buffer->record_disabled)) >> goto out; >> echo 0 > tracing_on >> record_disabled |= RB_BUFFER_OFF >> csum1=`md5sum trace` > > Note, the CPU1 is performing with preemption disabled, so for this to > happen, something really bad happened on CPU0 to delay preempt disabled > section so long to allow the trace to be read. Perhaps we should have > the return of the echo 0 > tracing_on require a synchronize_rcu() to > make sure all ring buffers see it disabled before it returns. > > But unless your system is doing something really stressed to cause the > preempt disabled section to take so long, I highly doubt this was the > race. > I have been only able to reliably reproduce this issue when the system is under load from stressors. But I am not sure if it can be considered as *really stressed*. system : 8 cpus (4 physical cores) load : stress-ng --fanotify 1 (or --context 2) result : ~5/10 test fails of course as load increases test starts to fail more often, but a single stressor doesn't seem like much to me for a 4 core machine. after adding synchronize_rcu() + patch from Sven, I am no longer seeing failures with the setup above. So it seems like synchronize_rcu() did the trick(or at least it helps a lot) for the case described on the previous mail. I couldn't trigger the failure yet, not even with increased load(but now the test case takes > 5mins to finish :) ). Here is the diff: diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c @@ -9328,10 +9328,12 @@ rb_simple_write(struct file *filp, const char __user *ubuf, val = 0; /* do nothing */ } else if (val) { tracer_tracing_on(tr); + synchronize_rcu(); if (tr->current_trace->start) tr->current_trace->start(tr); } else { tracer_tracing_off(tr); + synchronize_rcu(); if (tr->current_trace->stop) tr->current_trace->stop(tr); Not 100% sure if these were the correct places to add them.
On 2/12/24 19:53, Steven Rostedt wrote: > > Right, it will definitely force the race window to go away. > > Can you still trigger this issue with just Sven's patch and not this change? Sven's patch makes the test cases much more resilient. I needed to ramp up the load up to 4 stressors on a 4 core system before the test starts to fail again. At this point it is fair to say that the system is under significant load. The failing condition is now different, which tells me that Sven's patch have already solved(or improved) the previous issue. What is failing now: echo '** ENABLE TRACING' enable_tracing cnt=`cnt_trace` if [ $cnt -eq 0 ]; then fail "Nothing found in trace" fi Adding a sleep between enable_tracing and cnt_trace seems to improve the situation a lot. (my guess is that, the trace writer isn't getting any runtime before the test checks the trace output) echo '** ENABLE TRACING' enable_tracing +sleep $SLEEP_TIME cnt=`cnt_trace` if [ $cnt -eq 0 ]; then fail "Nothing found in trace" fi Didn't see any failure while increasing the system load until 8 stressors (this might have been luck). When system was under load with 8 stressors, I have seen the test failing 2 out of 10 times. Definitely an improvement over the current situation.
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 2a7c6fd934e9..47e221e1e720 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -1532,7 +1532,7 @@ void disable_trace_on_warning(void) bool tracer_tracing_is_on(struct trace_array *tr) { if (tr->array_buffer.buffer) - return ring_buffer_record_is_on(tr->array_buffer.buffer); + return ring_buffer_record_is_set_on(tr->array_buffer.buffer); return !tr->buffer_disabled; }