From patchwork Thu Aug 17 11:50:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zheng Yejian X-Patchwork-Id: 135893 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b82d:0:b0:3f2:4152:657d with SMTP id z13csp742088vqi; Thu, 17 Aug 2023 07:29:20 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHj+2al44LJWiSePq58gGr2NVgU38dZcAv2jMTfi31zo51i57FdKp+2b54cvRd80BcF17ei X-Received: by 2002:a05:6a21:3e0c:b0:144:5d5b:8e24 with SMTP id bk12-20020a056a213e0c00b001445d5b8e24mr4929003pzc.24.1692282560450; Thu, 17 Aug 2023 07:29:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1692282560; cv=none; d=google.com; s=arc-20160816; b=kibY9KnbHv0TPAy48HqYAiYKStF5zZqqnzqtrG1lqEe2gEZz6wV2s9mu0vP+XZwGCm XhPz03zsFvz6+b7iJWY+mepnAA7GPf3Z3YoeUUNzFuTud5E5qYL4cdqptHT6djinjAsZ AQawxM8G7P8cAvLLeEz6oBFwWUlPsOgewCN5YSC4sC3mGvWa/7lOyM+9bJd9DlTDkmsQ OLmMvxZAsXVGan0vZx40N1eCCGVh2285QQulyXrSgk8ZHCvkDLhMJcAoL51O3wbsYFRr DxyjE7buhVKe6/QMXrFc9Vh3s0WHgOLZYyXtt8SUnMZkMYBWtfAY9698ltdbmbnA4iaE BKKQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=TrvunQK9vUlECGEolSx6lsTRO3c1fcSWmUucZDFWdxk=; fh=y36lEArMweHnZcy6A4a4WIcFs2cLpnmEV/WSGPCbgcM=; b=M7Xj9zlvWF8o+cSXW/rv6MPZpxLOdVWttRddHmFN2kWGAiKGjbBxBGBJEltyP9uySV b50OR4G2Tn0WttGGYsCAuwTpj+EDWN/HMovYehKMBQ+V1IJbfyOrQJo9kIjNX1Y/voi8 jaLeQ9MR1sR/VZx+p95g8yMDk93Iv8pnAUmiD7Od/TFD7z8FrhBAlOGSD6Jm2oIa0r88 ZpnX8Cq3FR4om2fiQFSPheomLwa22fjzjYnmCV0kcc7AVbjjrgYb/SIuCBZp+pXo31ki 6wJWvsDKjeyXEMRAoE6HaGOuBJbyUK35typV7BR35xuCQrkoOPi5Fagyetz2mREf39fw MnNQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k28-20020a637b5c000000b00565f42b81b2si3323423pgn.162.2023.08.17.07.29.06; Thu, 17 Aug 2023 07:29:20 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1350526AbjHQLwP (ORCPT + 99 others); Thu, 17 Aug 2023 07:52:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60440 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1350516AbjHQLvo (ORCPT ); Thu, 17 Aug 2023 07:51:44 -0400 Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 049B63593; Thu, 17 Aug 2023 04:51:20 -0700 (PDT) Received: from dggpeml500012.china.huawei.com (unknown [172.30.72.57]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4RRNbT0Zdnz1GDV1; Thu, 17 Aug 2023 19:49:57 +0800 (CST) Received: from localhost.localdomain (10.67.175.61) by dggpeml500012.china.huawei.com (7.185.36.15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.31; Thu, 17 Aug 2023 19:51:18 +0800 From: Zheng Yejian To: CC: , , , , Subject: [RFC PATCH] tracing: Introduce pipe_cpumask to avoid race on trace_pipes Date: Thu, 17 Aug 2023 19:50:57 +0800 Message-ID: <20230817115057.1637676-1-zhengyejian1@huawei.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230816152308.5f887721@gandalf.local.home> References: <20230816152308.5f887721@gandalf.local.home> MIME-Version: 1.0 X-Originating-IP: [10.67.175.61] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To dggpeml500012.china.huawei.com (7.185.36.15) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1774486878087872389 X-GMAIL-MSGID: 1774486878087872389 There is race issue when concurrently splice_read main trace_pipe and per_cpu trace_pipes which will result in data read out being different from what actually writen. As suggested by Steven: > I believe we should add a ref count to trace_pipe and the per_cpu > trace_pipes, where if they are opened, nothing else can read it. > > Opening trace_pipe locks all per_cpu ref counts, if any of them are > open, then the trace_pipe open will fail (and releases any ref counts > it had taken). > > Opening a per_cpu trace_pipe will up the ref count for just that > CPU buffer. This will allow multiple tasks to read different per_cpu > trace_pipe files, but will prevent the main trace_pipe file from > being opened. But because we only need to know whether per_cpu trace_pipe is open or not, using a cpumask instead of using ref count may be easier. After this patch, users will find that: - Main trace_pipe can be opened by only one user, and if it is opened, all per_cpu trace_pipes cannot be opened; - Per_cpu trace_pipes can be opened by multiple users, but each per_cpu trace_pipe can only be opened by one user. And if one of them is opened, main trace_pipe cannot be opened. Suggested-by: Steven Rostedt (Google) Signed-off-by: Zheng Yejian Signed-off-by: Masami Hiramatsu (Google) --- > Does that work for this? Hi, Steven, how about using a cpumask instead of ref count? This patch will also prevent main trace_pipe or anyone per_cpu trace_pipe from being opened by multiple people at the same time. kernel/trace/trace.c | 56 ++++++++++++++++++++++++++++++++++++++------ kernel/trace/trace.h | 2 ++ 2 files changed, 51 insertions(+), 7 deletions(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index b8870078ef58..73f6f4d10d43 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -6705,24 +6705,54 @@ tracing_max_lat_write(struct file *filp, const char __user *ubuf, #endif +static int open_pipe_on_cpu(struct trace_array *tr, int cpu) +{ + if (cpu == RING_BUFFER_ALL_CPUS) { + if (cpumask_empty(tr->pipe_cpumask)) { + cpumask_setall(tr->pipe_cpumask); + return 0; + } + } else { + if (!cpumask_test_cpu(cpu, tr->pipe_cpumask)) { + cpumask_set_cpu(cpu, tr->pipe_cpumask); + return 0; + } + } + return -EBUSY; +} + +static void close_pipe_on_cpu(struct trace_array *tr, int cpu) +{ + if (cpu == RING_BUFFER_ALL_CPUS) { + WARN_ON(!cpumask_full(tr->pipe_cpumask)); + cpumask_clear(tr->pipe_cpumask); + } else { + WARN_ON(!cpumask_test_cpu(cpu, tr->pipe_cpumask)); + cpumask_clear_cpu(cpu, tr->pipe_cpumask); + } +} + static int tracing_open_pipe(struct inode *inode, struct file *filp) { struct trace_array *tr = inode->i_private; struct trace_iterator *iter; int ret; + int cpu = tracing_get_cpu(inode); ret = tracing_check_open_get_tr(tr); if (ret) return ret; mutex_lock(&trace_types_lock); + ret = open_pipe_on_cpu(tr, cpu); + if (ret) + goto fail_pipe_on_cpu; /* create a buffer to store the information to pass to userspace */ iter = kzalloc(sizeof(*iter), GFP_KERNEL); if (!iter) { ret = -ENOMEM; - __trace_array_put(tr); - goto out; + goto fail_alloc_iter; } trace_seq_init(&iter->seq); @@ -6745,7 +6775,7 @@ static int tracing_open_pipe(struct inode *inode, struct file *filp) iter->tr = tr; iter->array_buffer = &tr->array_buffer; - iter->cpu_file = tracing_get_cpu(inode); + iter->cpu_file = cpu; mutex_init(&iter->mutex); filp->private_data = iter; @@ -6755,12 +6785,15 @@ static int tracing_open_pipe(struct inode *inode, struct file *filp) nonseekable_open(inode, filp); tr->trace_ref++; -out: + mutex_unlock(&trace_types_lock); return ret; fail: kfree(iter); +fail_alloc_iter: + close_pipe_on_cpu(tr, cpu); +fail_pipe_on_cpu: __trace_array_put(tr); mutex_unlock(&trace_types_lock); return ret; @@ -6777,7 +6810,7 @@ static int tracing_release_pipe(struct inode *inode, struct file *file) if (iter->trace->pipe_close) iter->trace->pipe_close(iter); - + close_pipe_on_cpu(tr, iter->cpu_file); mutex_unlock(&trace_types_lock); free_cpumask_var(iter->started); @@ -9441,6 +9474,9 @@ static struct trace_array *trace_array_create(const char *name) if (!alloc_cpumask_var(&tr->tracing_cpumask, GFP_KERNEL)) goto out_free_tr; + if (!alloc_cpumask_var(&tr->pipe_cpumask, GFP_KERNEL)) + goto out_free_tr; + tr->trace_flags = global_trace.trace_flags & ~ZEROED_TRACE_FLAGS; cpumask_copy(tr->tracing_cpumask, cpu_all_mask); @@ -9482,6 +9518,7 @@ static struct trace_array *trace_array_create(const char *name) out_free_tr: ftrace_free_ftrace_ops(tr); free_trace_buffers(tr); + free_cpumask_var(tr->pipe_cpumask); free_cpumask_var(tr->tracing_cpumask); kfree(tr->name); kfree(tr); @@ -9584,6 +9621,7 @@ static int __remove_instance(struct trace_array *tr) } kfree(tr->topts); + free_cpumask_var(tr->pipe_cpumask); free_cpumask_var(tr->tracing_cpumask); kfree(tr->name); kfree(tr); @@ -10381,12 +10419,14 @@ __init static int tracer_alloc_buffers(void) if (trace_create_savedcmd() < 0) goto out_free_temp_buffer; + if (!alloc_cpumask_var(&global_trace.pipe_cpumask, GFP_KERNEL)) + goto out_free_savedcmd; + /* TODO: make the number of buffers hot pluggable with CPUS */ if (allocate_trace_buffers(&global_trace, ring_buf_size) < 0) { MEM_FAIL(1, "tracer: failed to allocate ring buffer!\n"); - goto out_free_savedcmd; + goto out_free_pipe_cpumask; } - if (global_trace.buffer_disabled) tracing_off(); @@ -10439,6 +10479,8 @@ __init static int tracer_alloc_buffers(void) return 0; +out_free_pipe_cpumask: + free_cpumask_var(global_trace.pipe_cpumask); out_free_savedcmd: free_saved_cmdlines_buffer(savedcmd); out_free_temp_buffer: diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index e1edc2197fc8..53ac0f7780c2 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -377,6 +377,8 @@ struct trace_array { struct list_head events; struct trace_event_file *trace_marker_file; cpumask_var_t tracing_cpumask; /* only trace on set CPUs */ + /* one per_cpu trace_pipe can be opened by only one user */ + cpumask_var_t pipe_cpumask; int ref; int trace_ref; #ifdef CONFIG_FUNCTION_TRACER