From patchwork Sun Dec 10 03:54:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steven Rostedt X-Patchwork-Id: 176275 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:bcd1:0:b0:403:3b70:6f57 with SMTP id r17csp6354013vqy; Sat, 9 Dec 2023 20:05:19 -0800 (PST) X-Google-Smtp-Source: AGHT+IFWVtbwGlWPdh2baicJxrjJn5E606v66pT2IC7ZIbTFefQck2yeSpKocWpbIFlXm2lxct+Q X-Received: by 2002:a05:6871:452:b0:1fb:75b:2b92 with SMTP id e18-20020a056871045200b001fb075b2b92mr1243863oag.78.1702181119256; Sat, 09 Dec 2023 20:05:19 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702181119; cv=none; d=google.com; s=arc-20160816; b=Rl5jjKZGYkMTyhS98MlFdyhaMjw7NKy42EMI/EZLD8WS2ddh38Q1n4ZNeqOGrOD2Bw cCht+6bgUPca3E1fLP0VbomjpXQ5APn9VTTbRNOW4d/NP6Ax7MKkzuXxwTCeKaIGMX6m 9qZ05h9bqnYrkbrDAhexNwGRLEILIiiSaD2W9r0sN3YY0OavSnHttCtxL53LS3CDoKCo U36bB8nMiGdIHQxs6w6Cpl9Lo98uQGEUXpGPsrkG+pTloaooSpqbfRknuYplnUBn/h5x 4gCyJ6C15b1U/f054tMmheBK/Sa7iwlvR+A7i/poWWDFbY9GCOw/42wbtDCtFc34SeaX FW2A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:subject:cc:to:from:date :user-agent:message-id; bh=fx2/QeR2Cd/LAxfi+tNePBDyZME5RKjxLzw14AG4qJM=; fh=pL1t9p0RxD5qCgSUniSa3P4+XmpksJYPzX6TE+dndao=; b=tydMr9R58TiDVszcolPS5aSZbaVxkYFrks63D8hac2iCviHM5+qbFIC924nPbnjp1W ZQn0LjlehHImtJSHD2c2yVTqJRkQS9jCTZETPmE56WLmY7JJ1r3Ya7voH4SMCPf9CFl2 y/fd3J73yxUydtB9dK0/VMNUdvo3Vtd5AJ0yOTHMpaD9BsI3abGFnCNNqQyOsz/QUEPX q5rbIyAhM5VlYt0NQDGlR+ajfkrfCWVUZbDkW/v8nUHBJ7+BQbs6nCfS63Hra2/y58Ml okUVfAqmXEzO/nOpgEUEVDGcb7BRShbOfSjnepAg/rJy3c0JIEmkqqK+3odJCtC+zqf7 T8aA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from morse.vger.email (morse.vger.email. [23.128.96.31]) by mx.google.com with ESMTPS id j27-20020a63231b000000b0058572c8d5d5si3834265pgj.233.2023.12.09.20.05.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 09 Dec 2023 20:05:19 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) client-ip=23.128.96.31; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by morse.vger.email (Postfix) with ESMTP id 7D0CF806599B; Sat, 9 Dec 2023 20:05:16 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at morse.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231915AbjLJEFE (ORCPT + 99 others); Sat, 9 Dec 2023 23:05:04 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33242 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231509AbjLJEEI (ORCPT ); Sat, 9 Dec 2023 23:04:08 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 01D6913A for ; Sat, 9 Dec 2023 20:04:15 -0800 (PST) Received: by smtp.kernel.org (Postfix) with ESMTPSA id DA8EAC433CD; Sun, 10 Dec 2023 04:04:14 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.97) (envelope-from ) id 1rCB3j-000000021RI-3qmN; Sat, 09 Dec 2023 23:04:51 -0500 Message-ID: <20231210040451.704153084@goodmis.org> User-Agent: quilt/0.67 Date: Sat, 09 Dec 2023 22:54:16 -0500 From: Steven Rostedt To: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Tzvetomir Stoyanov , Vincent Donnefort , Kent Overstreet Subject: [PATCH 12/14] ring-buffer: Just update the subbuffers when changing their allocation order References: <20231210035404.053677508@goodmis.org> MIME-Version: 1.0 X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on morse.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (morse.vger.email [0.0.0.0]); Sat, 09 Dec 2023 20:05:16 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1784866269218702671 X-GMAIL-MSGID: 1784866269218702671 From: "Steven Rostedt (Google)" The ring_buffer_subbuf_order_set() was creating ring_buffer_per_cpu cpu_buffers with the new subbuffers with the updated order, and if they all successfully were created, then they the ring_buffer's per_cpu buffers would be freed and replaced by them. The problem is that the freed per_cpu buffers contains state that would be lost. Running the following commands: 1. # echo 3 > /sys/kernel/tracing/buffer_subbuf_order 2. # echo 0 > /sys/kernel/tracing/tracing_cpumask 3. # echo 1 > /sys/kernel/tracing/snapshot 4. # echo ff > /sys/kernel/tracing/tracing_cpumask 5. # echo test > /sys/kernel/tracing/trace_marker Would result in: -bash: echo: write error: Bad file descriptor That's because the state of the per_cpu buffers of the snapshot buffer is lost when the order is changed (the order of a freed snapshot buffer goes to 0 to save memory, and when the snapshot buffer is allocated again, it goes back to what the main buffer is). In operation 2, the snapshot buffers were set to "disable" (as all the ring buffers CPUs were disabled). In operation 3, the snapshot is allocated and a call to ring_buffer_subbuf_order_set() replaced the per_cpu buffers losing the "record_disable" count. When it was enabled again, the atomic_dec(&cpu_buffer->record_disable) was decrementing a zero, setting it to -1. Writing 1 into the snapshot would swap the snapshot buffer with the main buffer, so now the main buffer is "disabled", and nothing can write to the ring buffer anymore. Instead of creating new per_cpu buffers and losing the state of the old buffers, basically do what the resize does and just allocate new subbuf pages into the new_pages link list of the per_cpu buffer and if they all succeed, then replace the old sub buffers with the new ones. This keeps the per_cpu buffer descriptor in tact and by doing so, keeps its state. Fixes: TBD ("ring-buffer: Set new size of the ring buffer sub page") Signed-off-by: Steven Rostedt (Google) --- kernel/trace/ring_buffer.c | 82 +++++++++++++++++++++++++++++++------- 1 file changed, 68 insertions(+), 14 deletions(-) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 4726deccd997..bbefb5838e64 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -5989,7 +5989,8 @@ EXPORT_SYMBOL_GPL(ring_buffer_subbuf_order_get); */ int ring_buffer_subbuf_order_set(struct trace_buffer *buffer, int order) { - struct ring_buffer_per_cpu **cpu_buffers; + struct ring_buffer_per_cpu *cpu_buffer; + struct buffer_page *bpage, *tmp; int old_order, old_size; int nr_pages; int psize; @@ -6008,9 +6009,6 @@ int ring_buffer_subbuf_order_set(struct trace_buffer *buffer, int order) return -EINVAL; bsize = sizeof(void *) * buffer->cpus; - cpu_buffers = kzalloc(bsize, GFP_KERNEL); - if (!cpu_buffers) - return -ENOMEM; old_order = buffer->subbuf_order; old_size = buffer->subbuf_size; @@ -6027,33 +6025,85 @@ int ring_buffer_subbuf_order_set(struct trace_buffer *buffer, int order) /* Make sure all new buffers are allocated, before deleting the old ones */ for_each_buffer_cpu(buffer, cpu) { + if (!cpumask_test_cpu(cpu, buffer->cpumask)) continue; + cpu_buffer = buffer->buffers[cpu]; + /* Update the number of pages to match the new size */ nr_pages = old_size * buffer->buffers[cpu]->nr_pages; nr_pages = DIV_ROUND_UP(nr_pages, buffer->subbuf_size); - cpu_buffers[cpu] = rb_allocate_cpu_buffer(buffer, nr_pages, cpu); - if (!cpu_buffers[cpu]) { + /* we need a minimum of two pages */ + if (nr_pages < 2) + nr_pages = 2; + + cpu_buffer->nr_pages_to_update = nr_pages; + + /* Include the reader page */ + nr_pages++; + + /* Allocate the new size buffer */ + INIT_LIST_HEAD(&cpu_buffer->new_pages); + if (__rb_allocate_pages(cpu_buffer, nr_pages, + &cpu_buffer->new_pages)) { + /* not enough memory for new pages */ err = -ENOMEM; goto error; } } for_each_buffer_cpu(buffer, cpu) { + if (!cpumask_test_cpu(cpu, buffer->cpumask)) continue; - rb_free_cpu_buffer(buffer->buffers[cpu]); - buffer->buffers[cpu] = cpu_buffers[cpu]; + cpu_buffer = buffer->buffers[cpu]; + + /* Clear the head bit to make the link list normal to read */ + rb_head_page_deactivate(cpu_buffer); + + /* Now walk the list and free all the old sub buffers */ + list_for_each_entry_safe(bpage, tmp, cpu_buffer->pages, list) { + list_del_init(&bpage->list); + free_buffer_page(bpage); + } + /* The above loop stopped an the last page needing to be freed */ + bpage = list_entry(cpu_buffer->pages, struct buffer_page, list); + free_buffer_page(bpage); + + /* Free the current reader page */ + free_buffer_page(cpu_buffer->reader_page); + + /* One page was allocated for the reader page */ + cpu_buffer->reader_page = list_entry(cpu_buffer->new_pages.next, + struct buffer_page, list); + list_del_init(&cpu_buffer->reader_page->list); + + /* The cpu_buffer pages are a link list with no head */ + cpu_buffer->pages = cpu_buffer->new_pages.next; + cpu_buffer->new_pages.next->prev = cpu_buffer->new_pages.prev; + cpu_buffer->new_pages.prev->next = cpu_buffer->new_pages.next; + + /* Clear the new_pages list */ + INIT_LIST_HEAD(&cpu_buffer->new_pages); + + cpu_buffer->head_page + = list_entry(cpu_buffer->pages, struct buffer_page, list); + cpu_buffer->tail_page = cpu_buffer->commit_page = cpu_buffer->head_page; + + cpu_buffer->nr_pages = cpu_buffer->nr_pages_to_update; + cpu_buffer->nr_pages_to_update = 0; + + rb_head_page_activate(cpu_buffer); + + rb_check_pages(cpu_buffer); } atomic_dec(&buffer->record_disabled); mutex_unlock(&buffer->mutex); - kfree(cpu_buffers); - return 0; error: @@ -6064,12 +6114,16 @@ int ring_buffer_subbuf_order_set(struct trace_buffer *buffer, int order) mutex_unlock(&buffer->mutex); for_each_buffer_cpu(buffer, cpu) { - if (!cpu_buffers[cpu]) + cpu_buffer = buffer->buffers[cpu]; + + if (!cpu_buffer->nr_pages_to_update) continue; - rb_free_cpu_buffer(cpu_buffers[cpu]); - kfree(cpu_buffers[cpu]); + + list_for_each_entry_safe(bpage, tmp, &cpu_buffer->new_pages, list) { + list_del_init(&bpage->list); + free_buffer_page(bpage); + } } - kfree(cpu_buffers); return err; }