From patchwork Fri Dec 15 17:55:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Rostedt X-Patchwork-Id: 179496 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:3b04:b0:fb:cd0c:d3e with SMTP id c4csp9464313dys; Fri, 15 Dec 2023 09:58:52 -0800 (PST) X-Google-Smtp-Source: AGHT+IE5JoPspPrbdbZpobXssQeyVQWkc7QC+ys+80dak6aNlLPUGag+mN6s4KxTiLRYXUYcAaTm X-Received: by 2002:a05:600c:4928:b0:40b:5e59:da8b with SMTP id f40-20020a05600c492800b0040b5e59da8bmr5832055wmp.158.1702663132280; Fri, 15 Dec 2023 09:58:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702663132; cv=none; d=google.com; s=arc-20160816; b=fVgIDaNY70DgDCHw/THu7cYUJIu5Vr7qePkS2o40Qt2ts6E7kGC0aEiH5OqttK0RwV P3BkDUawxG6o4Ax7dOp7C3T56tTyD7NC+eO8yA5O8dqLqVvr8huh+3iljeCdMp4dYZt0 93pd4J2cL8vQDF1QLT9y4715uNjBZyOuT2IhpLIzeI20ETlqKU6LrMXa3ndVoAt386NK kFiOiwhT754SCjEeCYJ7phuRqZKbrG3ou1QcyqcpVF6VMAQPj5vdZ98B7ljxtdMxszYC lzmdI+FAheZRJC6NBAAHvLN6+lPVikXzR1inLumbVhRRcFI9dUUm2sZMKvQ/7G61f84S qXPQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:list-unsubscribe:list-subscribe:list-id:precedence :references:subject:cc:to:from:date:user-agent:message-id; bh=nbVARfJPpOm7QGCbtn0VbfS8JVUynif1DxnCBpTPkTQ=; fh=pL1t9p0RxD5qCgSUniSa3P4+XmpksJYPzX6TE+dndao=; b=p7vATjltWmO5QcaMldIuZ+J79Cc3bzfuaY6xadis3fBhBrygPmB+gtAeb87naAer+0 2A6mwlMZdGJmdL4vjybwR3jD+2kuoMSa1mVz1ynPRfrIwlt5QDXsjIQtmVc3gbDSVLqg /rF3wM6bEsKLFRcumbPd/L0SAfqvG66ZQdwZ5GzNVDMZvzm7HLqz5ym7Z4fjOx32TtFD AVA5wcaUUFDbq4HPXvgeamVINYr6YO30MSghpYBEd2rw3L/vAF/OavkR4QlgRc+2hX3s qsF8S2cVcOZiDdemwWS3TBR2mbVgiCGiysbjzkG/3lTOr/rjSPAltD+wx6kCLog+p74x DMrw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-1488-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-1488-ouuuleilei=gmail.com@vger.kernel.org" Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id x12-20020a170906b08c00b00a1da6786698si7737061ejy.930.2023.12.15.09.58.52 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 15 Dec 2023 09:58:52 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-1488-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-1488-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-1488-ouuuleilei=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id AD2731F256A3 for ; Fri, 15 Dec 2023 17:58:51 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 023E45F1FD; Fri, 15 Dec 2023 17:54:59 +0000 (UTC) X-Original-To: linux-kernel@vger.kernel.org Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 114FF47F6C; Fri, 15 Dec 2023 17:54:57 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E4C1DC433C9; Fri, 15 Dec 2023 17:54:56 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.97) (envelope-from ) id 1rECPa-00000002wTI-3yfd; Fri, 15 Dec 2023 12:55:46 -0500 Message-ID: <20231215175546.723558608@goodmis.org> User-Agent: quilt/0.67 Date: Fri, 15 Dec 2023 12:55:14 -0500 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Tzvetomir Stoyanov , Vincent Donnefort , Kent Overstreet Subject: [PATCH v4 12/15] ring-buffer: Just update the subbuffers when changing their allocation order References: <20231215175502.106587604@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1785371696810338422 X-GMAIL-MSGID: 1785371696810338422 From: "Steven Rostedt (Google)" The ring_buffer_subbuf_order_set() was creating ring_buffer_per_cpu cpu_buffers with the new subbuffers with the updated order, and if they all successfully were created, then they the ring_buffer's per_cpu buffers would be freed and replaced by them. The problem is that the freed per_cpu buffers contains state that would be lost. Running the following commands: 1. # echo 3 > /sys/kernel/tracing/buffer_subbuf_order 2. # echo 0 > /sys/kernel/tracing/tracing_cpumask 3. # echo 1 > /sys/kernel/tracing/snapshot 4. # echo ff > /sys/kernel/tracing/tracing_cpumask 5. # echo test > /sys/kernel/tracing/trace_marker Would result in: -bash: echo: write error: Bad file descriptor That's because the state of the per_cpu buffers of the snapshot buffer is lost when the order is changed (the order of a freed snapshot buffer goes to 0 to save memory, and when the snapshot buffer is allocated again, it goes back to what the main buffer is). In operation 2, the snapshot buffers were set to "disable" (as all the ring buffers CPUs were disabled). In operation 3, the snapshot is allocated and a call to ring_buffer_subbuf_order_set() replaced the per_cpu buffers losing the "record_disable" count. When it was enabled again, the atomic_dec(&cpu_buffer->record_disable) was decrementing a zero, setting it to -1. Writing 1 into the snapshot would swap the snapshot buffer with the main buffer, so now the main buffer is "disabled", and nothing can write to the ring buffer anymore. Instead of creating new per_cpu buffers and losing the state of the old buffers, basically do what the resize does and just allocate new subbuf pages into the new_pages link list of the per_cpu buffer and if they all succeed, then replace the old sub buffers with the new ones. This keeps the per_cpu buffer descriptor in tact and by doing so, keeps its state. Fixes: TBD ("ring-buffer: Set new size of the ring buffer sub page") Signed-off-by: Steven Rostedt (Google) --- kernel/trace/ring_buffer.c | 88 ++++++++++++++++++++++++++++++-------- 1 file changed, 71 insertions(+), 17 deletions(-) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index a93dbbcea83d..9b95297339b6 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -5789,11 +5789,11 @@ EXPORT_SYMBOL_GPL(ring_buffer_subbuf_order_get); */ int ring_buffer_subbuf_order_set(struct trace_buffer *buffer, int order) { - struct ring_buffer_per_cpu **cpu_buffers; + struct ring_buffer_per_cpu *cpu_buffer; + struct buffer_page *bpage, *tmp; int old_order, old_size; int nr_pages; int psize; - int bsize; int err; int cpu; @@ -5807,11 +5807,6 @@ int ring_buffer_subbuf_order_set(struct trace_buffer *buffer, int order) if (psize <= BUF_PAGE_HDR_SIZE) return -EINVAL; - bsize = sizeof(void *) * buffer->cpus; - cpu_buffers = kzalloc(bsize, GFP_KERNEL); - if (!cpu_buffers) - return -ENOMEM; - old_order = buffer->subbuf_order; old_size = buffer->subbuf_size; @@ -5827,33 +5822,88 @@ int ring_buffer_subbuf_order_set(struct trace_buffer *buffer, int order) /* Make sure all new buffers are allocated, before deleting the old ones */ for_each_buffer_cpu(buffer, cpu) { + if (!cpumask_test_cpu(cpu, buffer->cpumask)) continue; + cpu_buffer = buffer->buffers[cpu]; + /* Update the number of pages to match the new size */ nr_pages = old_size * buffer->buffers[cpu]->nr_pages; nr_pages = DIV_ROUND_UP(nr_pages, buffer->subbuf_size); - cpu_buffers[cpu] = rb_allocate_cpu_buffer(buffer, nr_pages, cpu); - if (!cpu_buffers[cpu]) { + /* we need a minimum of two pages */ + if (nr_pages < 2) + nr_pages = 2; + + cpu_buffer->nr_pages_to_update = nr_pages; + + /* Include the reader page */ + nr_pages++; + + /* Allocate the new size buffer */ + INIT_LIST_HEAD(&cpu_buffer->new_pages); + if (__rb_allocate_pages(cpu_buffer, nr_pages, + &cpu_buffer->new_pages)) { + /* not enough memory for new pages */ err = -ENOMEM; goto error; } } for_each_buffer_cpu(buffer, cpu) { + if (!cpumask_test_cpu(cpu, buffer->cpumask)) continue; - rb_free_cpu_buffer(buffer->buffers[cpu]); - buffer->buffers[cpu] = cpu_buffers[cpu]; + cpu_buffer = buffer->buffers[cpu]; + + /* Clear the head bit to make the link list normal to read */ + rb_head_page_deactivate(cpu_buffer); + + /* Now walk the list and free all the old sub buffers */ + list_for_each_entry_safe(bpage, tmp, cpu_buffer->pages, list) { + list_del_init(&bpage->list); + free_buffer_page(bpage); + } + /* The above loop stopped an the last page needing to be freed */ + bpage = list_entry(cpu_buffer->pages, struct buffer_page, list); + free_buffer_page(bpage); + + /* Free the current reader page */ + free_buffer_page(cpu_buffer->reader_page); + + /* One page was allocated for the reader page */ + cpu_buffer->reader_page = list_entry(cpu_buffer->new_pages.next, + struct buffer_page, list); + list_del_init(&cpu_buffer->reader_page->list); + + /* The cpu_buffer pages are a link list with no head */ + cpu_buffer->pages = cpu_buffer->new_pages.next; + cpu_buffer->new_pages.next->prev = cpu_buffer->new_pages.prev; + cpu_buffer->new_pages.prev->next = cpu_buffer->new_pages.next; + + /* Clear the new_pages list */ + INIT_LIST_HEAD(&cpu_buffer->new_pages); + + cpu_buffer->head_page + = list_entry(cpu_buffer->pages, struct buffer_page, list); + cpu_buffer->tail_page = cpu_buffer->commit_page = cpu_buffer->head_page; + + cpu_buffer->nr_pages = cpu_buffer->nr_pages_to_update; + cpu_buffer->nr_pages_to_update = 0; + + free_pages((unsigned long)cpu_buffer->free_page, old_order); + cpu_buffer->free_page = NULL; + + rb_head_page_activate(cpu_buffer); + + rb_check_pages(cpu_buffer); } atomic_dec(&buffer->record_disabled); mutex_unlock(&buffer->mutex); - kfree(cpu_buffers); - return 0; error: @@ -5864,12 +5914,16 @@ int ring_buffer_subbuf_order_set(struct trace_buffer *buffer, int order) mutex_unlock(&buffer->mutex); for_each_buffer_cpu(buffer, cpu) { - if (!cpu_buffers[cpu]) + cpu_buffer = buffer->buffers[cpu]; + + if (!cpu_buffer->nr_pages_to_update) continue; - rb_free_cpu_buffer(cpu_buffers[cpu]); - kfree(cpu_buffers[cpu]); + + list_for_each_entry_safe(bpage, tmp, &cpu_buffer->new_pages, list) { + list_del_init(&bpage->list); + free_buffer_page(bpage); + } } - kfree(cpu_buffers); return err; }