Message ID | 20230208090814.869242-1-zhengyejian1@huawei.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp3346003wrn; Wed, 8 Feb 2023 01:07:43 -0800 (PST) X-Google-Smtp-Source: AK7set8IGo/efxcCA6TaYUDBF7bzEc57dYf8N8F/DOwQIDmYfo9CqRdRGroLHUXxeHfOoQtC1KTv X-Received: by 2002:a17:907:160e:b0:8aa:c035:a651 with SMTP id hb14-20020a170907160e00b008aac035a651mr3895538ejc.37.1675847263174; Wed, 08 Feb 2023 01:07:43 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1675847263; cv=none; d=google.com; s=arc-20160816; b=tojcmULZSIgx0W9pXLeKsjOZkEq8zA9qgQHFRMo/0I6wbitL57Fw6Ecxx1OzoEBOqs 7xICfxb641kBMly0UsAP7+OniP4RH39QINmS5Ov6MYVw1NuNPNTq65nGli9Z+4KcEr2q TJJ3+fafIdBxRe2IiF7mRl9+MTuibMhYAcO74jNedln37Vxy+4s/0JQbj6lYn/BKmMk+ FA5tcQP62H3q1nu+XNq0O3fCrMEVMRWd2//mR39+HCWr23zrzPGqpHKxJ8V5J1NSI+A3 lmT0P5WEhvn/p+RzGRp+Kn1jmWYxz2JtlBGh/pYSpG0dW38P+CxPjA3tmn6O38GI5T7B ny3Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=HjEgjXS+wYQP7yPZsZIpCQIOS2IYn8iquhJ5DVxM8tU=; b=KVgg0WFGMTsq5VGex6PfAiLCRmb6Qktorovnm/L5/bpWfdn/q5512FFUiqpB1IWgo5 pGP+XqsUaMEzGozOLBAy0EaQGFRbJrpR9OFMfVvY5hidJNEU65EulVqi+2REY0xo/MXa uofug1M8gsiztEcTCKe27PK5O30LoSpuoC4F1eG8FlZFmayCtRa/OGrcmWKvacDKE+/t 7C4LL57Fdylzj7Fkv21a7i5c7YgEwWKWC6LD35WUip5JDicynmCzLOVlw5bfLs2c9afR L+n3iC46WEdfPGAwIWZSaDnSWD33LvvwM/FYSqaQ6p0vEAS8cFnBfSWd7fsbwZ+SAj0w yUEg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id uj4-20020a170907c98400b007c18e8131e1si1518186ejc.744.2023.02.08.01.07.20; Wed, 08 Feb 2023 01:07:43 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230388AbjBHJG0 (ORCPT <rfc822;kmanaouilinux@gmail.com> + 99 others); Wed, 8 Feb 2023 04:06:26 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52118 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229490AbjBHJGV (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 8 Feb 2023 04:06:21 -0500 Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 122031CAF9; Wed, 8 Feb 2023 01:06:18 -0800 (PST) Received: from dggpeml100012.china.huawei.com (unknown [172.30.72.53]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4PBYs03BbRzJqyM; Wed, 8 Feb 2023 17:01:40 +0800 (CST) Received: from localhost.localdomain (10.67.175.61) by dggpeml100012.china.huawei.com (7.185.36.121) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.34; Wed, 8 Feb 2023 17:06:16 +0800 From: Zheng Yejian <zhengyejian1@huawei.com> To: <rostedt@goodmis.org>, <mhiramat@kernel.org> CC: <linux-kernel@vger.kernel.org>, <linux-trace-kernel@vger.kernel.org>, <zhengyejian1@huawei.com>, <wanghai38@huawei.com> Subject: [PATCH] tracing/ring-buffer: Remove integrity check at end of iter read Date: Wed, 8 Feb 2023 17:08:14 +0800 Message-ID: <20230208090814.869242-1-zhengyejian1@huawei.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Transfer-Encoding: 7BIT Content-Type: text/plain; charset=US-ASCII X-Originating-IP: [10.67.175.61] X-ClientProxiedBy: dggems702-chm.china.huawei.com (10.3.19.179) To dggpeml100012.china.huawei.com (7.185.36.121) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1757253219651834364?= X-GMAIL-MSGID: =?utf-8?q?1757253219651834364?= |
Series |
tracing/ring-buffer: Remove integrity check at end of iter read
|
|
Commit Message
Zheng Yejian
Feb. 8, 2023, 9:08 a.m. UTC
Concurrently closing "trace" file and writing into ring buffer [1] can cause WARNINGs [2]. It has been reported in Link: https://lore.kernel.org/all/20230203035608.2336906-1-zhengyejian1@huawei.com/ It seems a data race between ring_buffer writing and integrity check. That is, RB_FLAG of head_page is been updating, while at same time RB_FLAG was cleared when doing integrity check: rb_check_pages() rb_handle_head_page(): -------- -------- rb_head_page_deactivate() rb_head_page_set_normal() rb_head_page_activate() Integrity check at end of iter read was added since commit 659f451ff213 ("ring-buffer: Add integrity check at end of iter read"). As it's commit message said: > As reading via an iterator requires disabling the ring buffer, it > is a perfect place to have it. However, since commit 1039221cc278 ("ring-buffer: Do not disable recording when there is an iterator"), ring buffer was not disabled at that place, so that integrity check should be removed. 1: ``` read_trace.sh while true; do # the "trace" file is closed after read head -1 /sys/kernel/tracing/trace > /dev/null done ``` ``` repro.sh sysctl -w kernel.panic_on_warn=1 # function tracer will writing enough data into ring_buffer echo function > /sys/kernel/tracing/current_tracer ./read_trace.sh & ./read_trace.sh & ./read_trace.sh & ./read_trace.sh & ./read_trace.sh & ./read_trace.sh & ./read_trace.sh & ./read_trace.sh & ``` 2: ------------[ cut here ]------------ WARNING: CPU: 9 PID: 62 at kernel/trace/ring_buffer.c:2653 rb_move_tail+0x450/0x470 Modules linked in: CPU: 9 PID: 62 Comm: ksoftirqd/9 Tainted: G W 6.2.0-rc6+ Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 RIP: 0010:rb_move_tail+0x450/0x470 Code: ff ff 4c 89 c8 f0 4d 0f b1 02 48 89 c2 48 83 e2 fc 49 39 d0 75 24 83 e0 03 83 f8 02 0f 84 e1 fb ff ff 48 8b 57 10 f0 ff 42 08 <0f> 0b 83 f8 02 0f 84 ce fb ff ff e9 db RSP: 0018:ffffb5564089bd00 EFLAGS: 00000203 RAX: 0000000000000000 RBX: ffff9db385a2bf81 RCX: ffffb5564089bd18 RDX: ffff9db281110100 RSI: 0000000000000fe4 RDI: ffff9db380145400 RBP: ffff9db385a2bf80 R08: ffff9db385a2bfc0 R09: ffff9db385a2bfc2 R10: ffff9db385a6c000 R11: ffff9db385a2bf80 R12: 0000000000000000 R13: 00000000000003e8 R14: ffff9db281110100 R15: ffffffffbb006108 FS: 0000000000000000(0000) GS:ffff9db3bdcc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005602323024c8 CR3: 0000000022e0c000 CR4: 00000000000006e0 Call Trace: <TASK> ring_buffer_lock_reserve+0x136/0x360 ? __do_softirq+0x287/0x2df ? __pfx_rcu_softirq_qs+0x10/0x10 trace_function+0x21/0x110 ? __pfx_rcu_softirq_qs+0x10/0x10 ? __do_softirq+0x287/0x2df function_trace_call+0xf6/0x120 0xffffffffc038f097 ? rcu_softirq_qs+0x5/0x140 rcu_softirq_qs+0x5/0x140 __do_softirq+0x287/0x2df run_ksoftirqd+0x2a/0x30 smpboot_thread_fn+0x188/0x220 ? __pfx_smpboot_thread_fn+0x10/0x10 kthread+0xe7/0x110 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2c/0x50 </TASK> ---[ end trace 0000000000000000 ]--- Fixes: 1039221cc278 ("ring-buffer: Do not disable recording when there is an iterator") Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com> --- kernel/trace/ring_buffer.c | 11 ----------- 1 file changed, 11 deletions(-)
Comments
On Wed, 8 Feb 2023 17:08:14 +0800 Zheng Yejian <zhengyejian1@huawei.com> wrote: > Concurrently closing "trace" file and writing into ring buffer [1] can > cause WARNINGs [2]. It has been reported in > Link: https://lore.kernel.org/all/20230203035608.2336906-1-zhengyejian1@huawei.com/ > > It seems a data race between ring_buffer writing and integrity check. > That is, RB_FLAG of head_page is been updating, while at same time RB_FLAG > was cleared when doing integrity check: > rb_check_pages() rb_handle_head_page(): > -------- -------- > rb_head_page_deactivate() > rb_head_page_set_normal() > rb_head_page_activate() > Good catch! > Integrity check at end of iter read was added since commit 659f451ff213 > ("ring-buffer: Add integrity check at end of iter read"). As it's commit > message said: > > As reading via an iterator requires disabling the ring buffer, it > > is a perfect place to have it. > However, since commit 1039221cc278 ("ring-buffer: Do not disable recording > when there is an iterator"), ring buffer was not disabled at that place, > so that integrity check should be removed. > > 1: > ``` read_trace.sh > while true; > do > # the "trace" file is closed after read > head -1 /sys/kernel/tracing/trace > /dev/null > done > ``` > ``` repro.sh > sysctl -w kernel.panic_on_warn=1 > # function tracer will writing enough data into ring_buffer > echo function > /sys/kernel/tracing/current_tracer > ./read_trace.sh & > ./read_trace.sh & > ./read_trace.sh & > ./read_trace.sh & > ./read_trace.sh & > ./read_trace.sh & > ./read_trace.sh & > ./read_trace.sh & > ``` > > Fixes: 1039221cc278 ("ring-buffer: Do not disable recording when there is an iterator") > Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com> > --- > kernel/trace/ring_buffer.c | 11 ----------- > 1 file changed, 11 deletions(-) > > diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c > index c366a0a9ddba..34e955bd1e59 100644 > --- a/kernel/trace/ring_buffer.c > +++ b/kernel/trace/ring_buffer.c > @@ -5203,17 +5203,6 @@ void > ring_buffer_read_finish(struct ring_buffer_iter *iter) > { > struct ring_buffer_per_cpu *cpu_buffer = iter->cpu_buffer; > - unsigned long flags; > - > - /* > - * Ring buffer is disabled from recording, here's a good place > - * to check the integrity of the ring buffer. > - * Must prevent readers from trying to read, as the check > - * clears the HEAD page and readers require it. > - */ > - raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags); > - rb_check_pages(cpu_buffer); > - raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); I would rather find a way to make this still work than just removing it. Perhaps there's no reason to clear the flags, and change rb_check_pages() to mask them out before testing. Something like: static int rb_check_pages(struct ring_buffer_per_cpu *cpu_buffer) { struct list_head *head = cpu_buffer->pages; struct buffer_page *bpage, *tmp; if (RB_WARN_ON(cpu_buffer, rb_list_head(rb_list_head(head->next)->prev) != head)) return -1; if (RB_WARN_ON(cpu_buffer, rb_list_head(rb_list_head(head->prev)->next) != head)) return -1; if (rb_check_list(cpu_buffer, head)) return -1; list_for_each_entry_safe(bpage, tmp, head, list) { if (RB_WARN_ON(cpu_buffer, rb_list_head(rb_list_head(bpage->list.next)->prev) != &bpage->list)) return -1; if (RB_WARN_ON(cpu_buffer, rb_list_head(rb_list_head(bpage->list.prev)->next) != &bpage->list)) return -1; if (rb_check_list(cpu_buffer, &bpage->list)) return -1; } return 0; } I haven't tested the above. ? -- Steve > > atomic_dec(&cpu_buffer->resize_disabled); > kfree(iter->event);
On 2023/2/9 06:36, Steven Rostedt wrote: > On Wed, 8 Feb 2023 17:08:14 +0800 > Zheng Yejian <zhengyejian1@huawei.com> wrote: > >> Concurrently closing "trace" file and writing into ring buffer [1] can >> cause WARNINGs [2]. It has been reported in >> Link: https://lore.kernel.org/all/20230203035608.2336906-1-zhengyejian1@huawei.com/ >> >> It seems a data race between ring_buffer writing and integrity check. >> That is, RB_FLAG of head_page is been updating, while at same time RB_FLAG >> was cleared when doing integrity check: >> rb_check_pages() rb_handle_head_page(): >> -------- -------- >> rb_head_page_deactivate() >> rb_head_page_set_normal() >> rb_head_page_activate() >> > > Good catch! Thanks! > >> Integrity check at end of iter read was added since commit 659f451ff213 >> ("ring-buffer: Add integrity check at end of iter read"). As it's commit >> message said: >> > As reading via an iterator requires disabling the ring buffer, it >> > is a perfect place to have it. >> However, since commit 1039221cc278 ("ring-buffer: Do not disable recording >> when there is an iterator"), ring buffer was not disabled at that place, >> so that integrity check should be removed. >> >> 1: >> ``` read_trace.sh >> while true; >> do >> # the "trace" file is closed after read >> head -1 /sys/kernel/tracing/trace > /dev/null >> done >> ``` >> ``` repro.sh >> sysctl -w kernel.panic_on_warn=1 >> # function tracer will writing enough data into ring_buffer >> echo function > /sys/kernel/tracing/current_tracer >> ./read_trace.sh & >> ./read_trace.sh & >> ./read_trace.sh & >> ./read_trace.sh & >> ./read_trace.sh & >> ./read_trace.sh & >> ./read_trace.sh & >> ./read_trace.sh & >> ``` >> > > >> Fixes: 1039221cc278 ("ring-buffer: Do not disable recording when there is an iterator") >> Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com> >> --- >> kernel/trace/ring_buffer.c | 11 ----------- >> 1 file changed, 11 deletions(-) >> >> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c >> index c366a0a9ddba..34e955bd1e59 100644 >> --- a/kernel/trace/ring_buffer.c >> +++ b/kernel/trace/ring_buffer.c >> @@ -5203,17 +5203,6 @@ void >> ring_buffer_read_finish(struct ring_buffer_iter *iter) >> { >> struct ring_buffer_per_cpu *cpu_buffer = iter->cpu_buffer; >> - unsigned long flags; >> - >> - /* >> - * Ring buffer is disabled from recording, here's a good place >> - * to check the integrity of the ring buffer. >> - * Must prevent readers from trying to read, as the check >> - * clears the HEAD page and readers require it. >> - */ >> - raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags); >> - rb_check_pages(cpu_buffer); >> - raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); > > I would rather find a way to make this still work than just removing it. Yes, we can try to find the way. > > Perhaps there's no reason to clear the flags, and change rb_check_pages() > to mask them out before testing. Something like: > > static int rb_check_pages(struct ring_buffer_per_cpu *cpu_buffer) > { > struct list_head *head = cpu_buffer->pages; > struct buffer_page *bpage, *tmp; > > if (RB_WARN_ON(cpu_buffer, rb_list_head(rb_list_head(head->next)->prev) != head)) > return -1; > if (RB_WARN_ON(cpu_buffer, rb_list_head(rb_list_head(head->prev)->next) != head)) > return -1; > > if (rb_check_list(cpu_buffer, head)) rb_check_list() expect to check a page with RB_FLAG being cleared, but in this solution, rb_head_page_deactivate() is not called before, so we may not call it directly? The same problem with below check for "bpage->list". > return -1; > > list_for_each_entry_safe(bpage, tmp, head, list) { I'd like to know if there is a case that "head" happens to be a "reader_page", and the ring buffer is not exactly being traversed? > if (RB_WARN_ON(cpu_buffer, > rb_list_head(rb_list_head(bpage->list.next)->prev) != &bpage->list)) > return -1; > if (RB_WARN_ON(cpu_buffer, > rb_list_head(rb_list_head(bpage->list.prev)->next) != &bpage->list)) > return -1; > if (rb_check_list(cpu_buffer, &bpage->list)) > return -1; > } > > return 0; > } > > I haven't tested the above. > > ? > > -- Steve > > >> >> atomic_dec(&cpu_buffer->resize_disabled); >> kfree(iter->event); >
On 2023/2/9 06:36, Steven Rostedt wrote: >> On Wed, 8 Feb 2023 17:08:14 +0800 >> Zheng Yejian <zhengyejian1@huawei.com> wrote: >> >>> Concurrently closing "trace" file and writing into ring buffer [1] can >>> cause WARNINGs [2]. It has been reported in >>> Link: https://lore.kernel.org/all/20230203035608.2336906-1-zhengyejian1@huawei.com/ >>> >>> It seems a data race between ring_buffer writing and integrity check. >>> That is, RB_FLAG of head_page is been updating, while at same time RB_FLAG >>> was cleared when doing integrity check: >>> rb_check_pages() rb_handle_head_page(): >>> -------- -------- >>> rb_head_page_deactivate() >>> rb_head_page_set_normal() >>> rb_head_page_activate() >>> >> >> Good catch! > >Thanks! > >> >>> Integrity check at end of iter read was added since commit 659f451ff213 >>> ("ring-buffer: Add integrity check at end of iter read"). As it's commit >>> message said: >>> > As reading via an iterator requires disabling the ring buffer, it >>> > is a perfect place to have it. >>> However, since commit 1039221cc278 ("ring-buffer: Do not disable recording >>> when there is an iterator"), ring buffer was not disabled at that place, >>> so that integrity check should be removed. >>> >>> 1: >>> ``` read_trace.sh >>> while true; >>> do >>> # the "trace" file is closed after read >>> head -1 /sys/kernel/tracing/trace > /dev/null >>> done >>> ``` >>> ``` repro.sh >>> sysctl -w kernel.panic_on_warn=1 >>> # function tracer will writing enough data into ring_buffer >>> echo function > /sys/kernel/tracing/current_tracer >>> ./read_trace.sh & >>> ./read_trace.sh & >>> ./read_trace.sh & >>> ./read_trace.sh & >>> ./read_trace.sh & >>> ./read_trace.sh & >>> ./read_trace.sh & >>> ./read_trace.sh & >>> ``` >>> >> >> >>> Fixes: 1039221cc278 ("ring-buffer: Do not disable recording when there is an iterator") >>> Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com> >>> --- >>> kernel/trace/ring_buffer.c | 11 ----------- >>> 1 file changed, 11 deletions(-) >>> >>> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c >>> index c366a0a9ddba..34e955bd1e59 100644 >>> --- a/kernel/trace/ring_buffer.c >>> +++ b/kernel/trace/ring_buffer.c >>> @@ -5203,17 +5203,6 @@ void >>> ring_buffer_read_finish(struct ring_buffer_iter *iter) >>> { >>> struct ring_buffer_per_cpu *cpu_buffer = iter->cpu_buffer; >>> - unsigned long flags; >>> - >>> - /* >>> - * Ring buffer is disabled from recording, here's a good place >>> - * to check the integrity of the ring buffer. >>> - * Must prevent readers from trying to read, as the check >>> - * clears the HEAD page and readers require it. >>> - */ >>> - raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags); >>> - rb_check_pages(cpu_buffer); >>> - raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); >> >> I would rather find a way to make this still work than just removing it. > >Yes, we can try to find the way. > >> >> Perhaps there's no reason to clear the flags, and change rb_check_pages() >> to mask them out before testing. Something like: >> >> static int rb_check_pages(struct ring_buffer_per_cpu *cpu_buffer) >> { >> struct list_head *head = cpu_buffer->pages; >> struct buffer_page *bpage, *tmp; >> >> if (RB_WARN_ON(cpu_buffer, rb_list_head(rb_list_head(head->next)->prev) != head)) >> return -1; >> if (RB_WARN_ON(cpu_buffer, rb_list_head(rb_list_head(head->prev)->next) != head)) >> return -1; >> >> if (rb_check_list(cpu_buffer, head)) > >rb_check_list() expect to check a page with RB_FLAG being cleared, >but in this solution, rb_head_page_deactivate() is not called before, >so we may not call it directly? The same problem with below check for >"bpage->list". Correct. >> return -1; >> >> list_for_each_entry_safe(bpage, tmp, head, list) { > >I'd like to know if there is a case that "head" happens to be a >"reader_page", and the ring buffer is not exactly being traversed? In my issue, i see below callstack and it seem to be spinning inside rb_list_head_clear() as cpu_buffer->pages has duplicate entry in the list. -00 |rb_list_head_clear(inline) -00 |rb_head_page_deactivate(inline) -00 |rb_check_pages(cpu_buffer = 0xFFFFFF89E0C3B200) -01 |atomic_try_cmpxchg_acquire(inline) -01 |queued_spin_lock(inline) -01 |do_raw_spin_lock_flags(inline) -01 |__raw_spin_lock_irqsave(inline) -01 |_raw_spin_lock_irqsave(inline) -01 |ring_buffer_read_finish(iter = 0xFFFFFF8006FE3780) -02 |cpumask_next(inline) -02 |tracing_release(inode = ?, file = 0xFFFFFF8A53A63F00) -03 |__fput(file = 0xFFFFFF8A53A63F00) -04 |____fput(work = ?) -05 |_raw_spin_unlock_irq(inline) -05 |task_work_run() -06 |tracehook_notify_resume(inline) -06 |do_notify_resume(regs = 0xFFFFFFC06ADC8EB0, thread_flags = 67108868) -07 |prepare_exit_to_user_mode(inline) -07 |exit_to_user_mode(inline) -07 |el0_svc(regs = 0xFFFFFFC06ADC8EB0) -08 |el0t_64_sync_handler(regs = ?) -09 |el0t_64_sync(asm) ... .. ffffff80359eeb00 --> Duplicate entry ffffff80359ee300 ffffff80359ee180 ffffff80359eeec0 ffffff80359eec00 ffffff80359ee800 -- Tail page ffffff80359eedc0 -- Head page ffffff80359ee640 ffffff80359ee080 ffffff80359ee700 ffffff80359ee7c0 ffffff80359eed80 ffffff80359ee900 ffffff80359ee9c0 ffffff80359eea00 ffffff80359eea80 ffffff80359eec80 ffffff80359ee240 ffffff80359ee6c0 ffffff80359ee0c0 ffffff80359ee8c0 ffffff80359ee940 ffffff80359eee00 ffffff80359ee000 ffffff80359eeb00 ---> Duplicate entry -Mukesh > >> if (RB_WARN_ON(cpu_buffer, >> rb_list_head(rb_list_head(bpage->list.next)->prev) != &bpage->list)) >> return -1; >> if (RB_WARN_ON(cpu_buffer, >> rb_list_head(rb_list_head(bpage->list.prev)->next) != &bpage->list)) >> return -1; >> if (rb_check_list(cpu_buffer, &bpage->list)) >> return -1; >> } >> >> return 0; >> } >> >> I haven't tested the above. >> >> ? >> >> -- Steve >> >> >>> >>> atomic_dec(&cpu_buffer->resize_disabled); >>> kfree(iter->event);
On Fri, 10 Feb 2023 20:52:36 +0530 Mukesh Ojha <quic_mojha@quicinc.com> wrote: > >> return -1; > >> > >> list_for_each_entry_safe(bpage, tmp, head, list) { > > > >I'd like to know if there is a case that "head" happens to be a > >"reader_page", and the ring buffer is not exactly being traversed? No, the way it works is that the reader page is found by searching for the head pointer, and then it is set when swapped. Basically, the pseudo code is: reader->next = head_page | HEAD_FLAG val = head_page->prev->next val &= ~FLAGS val |= HEAD_FLAG cmpxchg(head_page->prev->next, val, reader) The HEAD_FLAG is always on the pointer that points to the head page that gets swapped. This will never point to the reader page, as that would mean the writer has access to it. > > In my issue, i see below callstack and it seem to be spinning inside rb_list_head_clear() as > cpu_buffer->pages has duplicate entry in the list. > > -00 |rb_list_head_clear(inline) > -00 |rb_head_page_deactivate(inline) > -00 |rb_check_pages(cpu_buffer = 0xFFFFFF89E0C3B200) > -01 |atomic_try_cmpxchg_acquire(inline) > -01 |queued_spin_lock(inline) > -01 |do_raw_spin_lock_flags(inline) > -01 |__raw_spin_lock_irqsave(inline) > -01 |_raw_spin_lock_irqsave(inline) > -01 |ring_buffer_read_finish(iter = 0xFFFFFF8006FE3780) > -02 |cpumask_next(inline) > -02 |tracing_release(inode = ?, file = 0xFFFFFF8A53A63F00) > -03 |__fput(file = 0xFFFFFF8A53A63F00) > -04 |____fput(work = ?) > -05 |_raw_spin_unlock_irq(inline) > -05 |task_work_run() > -06 |tracehook_notify_resume(inline) > -06 |do_notify_resume(regs = 0xFFFFFFC06ADC8EB0, thread_flags = 67108868) > -07 |prepare_exit_to_user_mode(inline) > -07 |exit_to_user_mode(inline) > -07 |el0_svc(regs = 0xFFFFFFC06ADC8EB0) > -08 |el0t_64_sync_handler(regs = ?) > -09 |el0t_64_sync(asm) > > ... > .. > ffffff80359eeb00 --> Duplicate entry > ffffff80359ee300 > ffffff80359ee180 > ffffff80359eeec0 > ffffff80359eec00 > ffffff80359ee800 -- Tail page > ffffff80359eedc0 -- Head page > ffffff80359ee640 > ffffff80359ee080 > ffffff80359ee700 > ffffff80359ee7c0 > ffffff80359eed80 > ffffff80359ee900 > ffffff80359ee9c0 > ffffff80359eea00 > ffffff80359eea80 > ffffff80359eec80 > ffffff80359ee240 > ffffff80359ee6c0 > ffffff80359ee0c0 > ffffff80359ee8c0 > ffffff80359ee940 > ffffff80359eee00 > ffffff80359ee000 > ffffff80359eeb00 ---> Duplicate entry So this is a separate issue where the ring buffer is corrupted? -- Steve
On 2/11/2023 10:07 PM, Steven Rostedt wrote: > On Fri, 10 Feb 2023 20:52:36 +0530 > Mukesh Ojha <quic_mojha@quicinc.com> wrote: > >>>> return -1; >>>> >>>> list_for_each_entry_safe(bpage, tmp, head, list) { >>> >>> I'd like to know if there is a case that "head" happens to be a >>> "reader_page", and the ring buffer is not exactly being traversed? > > No, the way it works is that the reader page is found by searching for the > head pointer, and then it is set when swapped. Basically, the pseudo code > is: > > reader->next = head_page | HEAD_FLAG > val = head_page->prev->next > val &= ~FLAGS > val |= HEAD_FLAG > cmpxchg(head_page->prev->next, val, reader) > > The HEAD_FLAG is always on the pointer that points to the head page that > gets swapped. This will never point to the reader page, as that would mean > the writer has access to it. > >> >> In my issue, i see below callstack and it seem to be spinning inside rb_list_head_clear() as >> cpu_buffer->pages has duplicate entry in the list. >> >> -00 |rb_list_head_clear(inline) >> -00 |rb_head_page_deactivate(inline) >> -00 |rb_check_pages(cpu_buffer = 0xFFFFFF89E0C3B200) >> -01 |atomic_try_cmpxchg_acquire(inline) >> -01 |queued_spin_lock(inline) >> -01 |do_raw_spin_lock_flags(inline) >> -01 |__raw_spin_lock_irqsave(inline) >> -01 |_raw_spin_lock_irqsave(inline) >> -01 |ring_buffer_read_finish(iter = 0xFFFFFF8006FE3780) >> -02 |cpumask_next(inline) >> -02 |tracing_release(inode = ?, file = 0xFFFFFF8A53A63F00) >> -03 |__fput(file = 0xFFFFFF8A53A63F00) >> -04 |____fput(work = ?) >> -05 |_raw_spin_unlock_irq(inline) >> -05 |task_work_run() >> -06 |tracehook_notify_resume(inline) >> -06 |do_notify_resume(regs = 0xFFFFFFC06ADC8EB0, thread_flags = 67108868) >> -07 |prepare_exit_to_user_mode(inline) >> -07 |exit_to_user_mode(inline) >> -07 |el0_svc(regs = 0xFFFFFFC06ADC8EB0) >> -08 |el0t_64_sync_handler(regs = ?) >> -09 |el0t_64_sync(asm) >> >> ... >> .. >> ffffff80359eeb00 --> Duplicate entry >> ffffff80359ee300 >> ffffff80359ee180 >> ffffff80359eeec0 >> ffffff80359eec00 >> ffffff80359ee800 -- Tail page >> ffffff80359eedc0 -- Head page >> ffffff80359ee640 >> ffffff80359ee080 >> ffffff80359ee700 >> ffffff80359ee7c0 >> ffffff80359eed80 >> ffffff80359ee900 >> ffffff80359ee9c0 >> ffffff80359eea00 >> ffffff80359eea80 >> ffffff80359eec80 >> ffffff80359ee240 >> ffffff80359ee6c0 >> ffffff80359ee0c0 >> ffffff80359ee8c0 >> ffffff80359ee940 >> ffffff80359eee00 >> ffffff80359ee000 >> ffffff80359eeb00 ---> Duplicate entry > > So this is a separate issue where the ring buffer is corrupted? It looks to be different issue and there also i see similar call stack of tracing_release() but in that issue it is looping forever in deactivate call due to list corruption. I am not yet able to root cause a place of corruption as it is reproduced only once, will need to check more on this. For this issue, i have posted at https://lore.kernel.org/lkml/1676376403-16462-1-git-send-email-quic_mojha@quicinc.com/ -Mukesh > > -- Steve
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index c366a0a9ddba..34e955bd1e59 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -5203,17 +5203,6 @@ void ring_buffer_read_finish(struct ring_buffer_iter *iter) { struct ring_buffer_per_cpu *cpu_buffer = iter->cpu_buffer; - unsigned long flags; - - /* - * Ring buffer is disabled from recording, here's a good place - * to check the integrity of the ring buffer. - * Must prevent readers from trying to read, as the check - * clears the HEAD page and readers require it. - */ - raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags); - rb_check_pages(cpu_buffer); - raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); atomic_dec(&cpu_buffer->resize_disabled); kfree(iter->event);