tracing/ring-buffer: Remove integrity check at end of iter read

Message ID 20230208090814.869242-1-zhengyejian1@huawei.com
State New
Headers
Series tracing/ring-buffer: Remove integrity check at end of iter read |

Commit Message

Zheng Yejian Feb. 8, 2023, 9:08 a.m. UTC
  Concurrently closing "trace" file and writing into ring buffer [1] can
cause WARNINGs [2]. It has been reported in
Link: https://lore.kernel.org/all/20230203035608.2336906-1-zhengyejian1@huawei.com/

It seems a data race between ring_buffer writing and integrity check.
That is, RB_FLAG of head_page is been updating, while at same time RB_FLAG
was cleared when doing integrity check:
  rb_check_pages()            rb_handle_head_page():
  --------                    --------
  rb_head_page_deactivate()
                              rb_head_page_set_normal()
  rb_head_page_activate()

Integrity check at end of iter read was added since commit 659f451ff213
("ring-buffer: Add integrity check at end of iter read"). As it's commit
message said:
  > As reading via an iterator requires disabling the ring buffer, it
  > is a perfect place to have it.
However, since commit 1039221cc278 ("ring-buffer: Do not disable recording
when there is an iterator"), ring buffer was not disabled at that place,
so that integrity check should be removed.

1:
``` read_trace.sh
  while true;
  do
    # the "trace" file is closed after read
    head -1 /sys/kernel/tracing/trace > /dev/null
  done
```
``` repro.sh
  sysctl -w kernel.panic_on_warn=1
  # function tracer will writing enough data into ring_buffer
  echo function > /sys/kernel/tracing/current_tracer
  ./read_trace.sh &
  ./read_trace.sh &
  ./read_trace.sh &
  ./read_trace.sh &
  ./read_trace.sh &
  ./read_trace.sh &
  ./read_trace.sh &
  ./read_trace.sh &
```

2:
------------[ cut here ]------------
WARNING: CPU: 9 PID: 62 at kernel/trace/ring_buffer.c:2653
rb_move_tail+0x450/0x470
Modules linked in:
CPU: 9 PID: 62 Comm: ksoftirqd/9 Tainted: G        W          6.2.0-rc6+
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
RIP: 0010:rb_move_tail+0x450/0x470
Code: ff ff 4c 89 c8 f0 4d 0f b1 02 48 89 c2 48 83 e2 fc 49 39 d0 75 24
83 e0 03 83 f8 02 0f 84 e1 fb ff ff 48 8b 57 10 f0 ff 42 08 <0f> 0b 83
f8 02 0f 84 ce fb ff ff e9 db
RSP: 0018:ffffb5564089bd00 EFLAGS: 00000203
RAX: 0000000000000000 RBX: ffff9db385a2bf81 RCX: ffffb5564089bd18
RDX: ffff9db281110100 RSI: 0000000000000fe4 RDI: ffff9db380145400
RBP: ffff9db385a2bf80 R08: ffff9db385a2bfc0 R09: ffff9db385a2bfc2
R10: ffff9db385a6c000 R11: ffff9db385a2bf80 R12: 0000000000000000
R13: 00000000000003e8 R14: ffff9db281110100 R15: ffffffffbb006108
FS:  0000000000000000(0000) GS:ffff9db3bdcc0000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005602323024c8 CR3: 0000000022e0c000 CR4: 00000000000006e0
Call Trace:
 <TASK>
 ring_buffer_lock_reserve+0x136/0x360
 ? __do_softirq+0x287/0x2df
 ? __pfx_rcu_softirq_qs+0x10/0x10
 trace_function+0x21/0x110
 ? __pfx_rcu_softirq_qs+0x10/0x10
 ? __do_softirq+0x287/0x2df
 function_trace_call+0xf6/0x120
 0xffffffffc038f097
 ? rcu_softirq_qs+0x5/0x140
 rcu_softirq_qs+0x5/0x140
 __do_softirq+0x287/0x2df
 run_ksoftirqd+0x2a/0x30
 smpboot_thread_fn+0x188/0x220
 ? __pfx_smpboot_thread_fn+0x10/0x10
 kthread+0xe7/0x110
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x2c/0x50
 </TASK>
---[ end trace 0000000000000000 ]---

Fixes: 1039221cc278 ("ring-buffer: Do not disable recording when there is an iterator")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
---
 kernel/trace/ring_buffer.c | 11 -----------
 1 file changed, 11 deletions(-)
  

Comments

Steven Rostedt Feb. 8, 2023, 10:36 p.m. UTC | #1
On Wed, 8 Feb 2023 17:08:14 +0800
Zheng Yejian <zhengyejian1@huawei.com> wrote:

> Concurrently closing "trace" file and writing into ring buffer [1] can
> cause WARNINGs [2]. It has been reported in
> Link: https://lore.kernel.org/all/20230203035608.2336906-1-zhengyejian1@huawei.com/
> 
> It seems a data race between ring_buffer writing and integrity check.
> That is, RB_FLAG of head_page is been updating, while at same time RB_FLAG
> was cleared when doing integrity check:
>   rb_check_pages()            rb_handle_head_page():
>   --------                    --------
>   rb_head_page_deactivate()
>                               rb_head_page_set_normal()
>   rb_head_page_activate()
> 

Good catch!

> Integrity check at end of iter read was added since commit 659f451ff213
> ("ring-buffer: Add integrity check at end of iter read"). As it's commit
> message said:
>   > As reading via an iterator requires disabling the ring buffer, it
>   > is a perfect place to have it.  
> However, since commit 1039221cc278 ("ring-buffer: Do not disable recording
> when there is an iterator"), ring buffer was not disabled at that place,
> so that integrity check should be removed.
> 
> 1:
> ``` read_trace.sh
>   while true;
>   do
>     # the "trace" file is closed after read
>     head -1 /sys/kernel/tracing/trace > /dev/null
>   done
> ```
> ``` repro.sh
>   sysctl -w kernel.panic_on_warn=1
>   # function tracer will writing enough data into ring_buffer
>   echo function > /sys/kernel/tracing/current_tracer
>   ./read_trace.sh &
>   ./read_trace.sh &
>   ./read_trace.sh &
>   ./read_trace.sh &
>   ./read_trace.sh &
>   ./read_trace.sh &
>   ./read_trace.sh &
>   ./read_trace.sh &
> ```
> 


> Fixes: 1039221cc278 ("ring-buffer: Do not disable recording when there is an iterator")
> Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
> ---
>  kernel/trace/ring_buffer.c | 11 -----------
>  1 file changed, 11 deletions(-)
> 
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index c366a0a9ddba..34e955bd1e59 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -5203,17 +5203,6 @@ void
>  ring_buffer_read_finish(struct ring_buffer_iter *iter)
>  {
>  	struct ring_buffer_per_cpu *cpu_buffer = iter->cpu_buffer;
> -	unsigned long flags;
> -
> -	/*
> -	 * Ring buffer is disabled from recording, here's a good place
> -	 * to check the integrity of the ring buffer.
> -	 * Must prevent readers from trying to read, as the check
> -	 * clears the HEAD page and readers require it.
> -	 */
> -	raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
> -	rb_check_pages(cpu_buffer);
> -	raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);

I would rather find a way to make this still work than just removing it.

Perhaps there's no reason to clear the flags, and change rb_check_pages()
to mask them out before testing. Something like:

static int rb_check_pages(struct ring_buffer_per_cpu *cpu_buffer)
{
	struct list_head *head = cpu_buffer->pages;
	struct buffer_page *bpage, *tmp;

	if (RB_WARN_ON(cpu_buffer, rb_list_head(rb_list_head(head->next)->prev) != head))
		return -1;
	if (RB_WARN_ON(cpu_buffer, rb_list_head(rb_list_head(head->prev)->next) != head))
		return -1;

	if (rb_check_list(cpu_buffer, head))
		return -1;

	list_for_each_entry_safe(bpage, tmp, head, list) {
		if (RB_WARN_ON(cpu_buffer,
		     rb_list_head(rb_list_head(bpage->list.next)->prev) != &bpage->list))
			return -1;
		if (RB_WARN_ON(cpu_buffer,
		     rb_list_head(rb_list_head(bpage->list.prev)->next) != &bpage->list))
			return -1;
		if (rb_check_list(cpu_buffer, &bpage->list))
			return -1;
	}

	return 0;
}

I haven't tested the above.

?

-- Steve


>  
>  	atomic_dec(&cpu_buffer->resize_disabled);
>  	kfree(iter->event);
  
Zheng Yejian Feb. 9, 2023, 3:50 a.m. UTC | #2
On 2023/2/9 06:36, Steven Rostedt wrote:
> On Wed, 8 Feb 2023 17:08:14 +0800
> Zheng Yejian <zhengyejian1@huawei.com> wrote:
> 
>> Concurrently closing "trace" file and writing into ring buffer [1] can
>> cause WARNINGs [2]. It has been reported in
>> Link: https://lore.kernel.org/all/20230203035608.2336906-1-zhengyejian1@huawei.com/
>>
>> It seems a data race between ring_buffer writing and integrity check.
>> That is, RB_FLAG of head_page is been updating, while at same time RB_FLAG
>> was cleared when doing integrity check:
>>    rb_check_pages()            rb_handle_head_page():
>>    --------                    --------
>>    rb_head_page_deactivate()
>>                                rb_head_page_set_normal()
>>    rb_head_page_activate()
>>
> 
> Good catch!

Thanks!

> 
>> Integrity check at end of iter read was added since commit 659f451ff213
>> ("ring-buffer: Add integrity check at end of iter read"). As it's commit
>> message said:
>>    > As reading via an iterator requires disabling the ring buffer, it
>>    > is a perfect place to have it.
>> However, since commit 1039221cc278 ("ring-buffer: Do not disable recording
>> when there is an iterator"), ring buffer was not disabled at that place,
>> so that integrity check should be removed.
>>
>> 1:
>> ``` read_trace.sh
>>    while true;
>>    do
>>      # the "trace" file is closed after read
>>      head -1 /sys/kernel/tracing/trace > /dev/null
>>    done
>> ```
>> ``` repro.sh
>>    sysctl -w kernel.panic_on_warn=1
>>    # function tracer will writing enough data into ring_buffer
>>    echo function > /sys/kernel/tracing/current_tracer
>>    ./read_trace.sh &
>>    ./read_trace.sh &
>>    ./read_trace.sh &
>>    ./read_trace.sh &
>>    ./read_trace.sh &
>>    ./read_trace.sh &
>>    ./read_trace.sh &
>>    ./read_trace.sh &
>> ```
>>
> 
> 
>> Fixes: 1039221cc278 ("ring-buffer: Do not disable recording when there is an iterator")
>> Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
>> ---
>>   kernel/trace/ring_buffer.c | 11 -----------
>>   1 file changed, 11 deletions(-)
>>
>> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
>> index c366a0a9ddba..34e955bd1e59 100644
>> --- a/kernel/trace/ring_buffer.c
>> +++ b/kernel/trace/ring_buffer.c
>> @@ -5203,17 +5203,6 @@ void
>>   ring_buffer_read_finish(struct ring_buffer_iter *iter)
>>   {
>>   	struct ring_buffer_per_cpu *cpu_buffer = iter->cpu_buffer;
>> -	unsigned long flags;
>> -
>> -	/*
>> -	 * Ring buffer is disabled from recording, here's a good place
>> -	 * to check the integrity of the ring buffer.
>> -	 * Must prevent readers from trying to read, as the check
>> -	 * clears the HEAD page and readers require it.
>> -	 */
>> -	raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
>> -	rb_check_pages(cpu_buffer);
>> -	raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
> 
> I would rather find a way to make this still work than just removing it.

Yes, we can try to find the way.

> 
> Perhaps there's no reason to clear the flags, and change rb_check_pages()
> to mask them out before testing. Something like:
> 
> static int rb_check_pages(struct ring_buffer_per_cpu *cpu_buffer)
> {
> 	struct list_head *head = cpu_buffer->pages;
> 	struct buffer_page *bpage, *tmp;
> 
> 	if (RB_WARN_ON(cpu_buffer, rb_list_head(rb_list_head(head->next)->prev) != head))
> 		return -1;
> 	if (RB_WARN_ON(cpu_buffer, rb_list_head(rb_list_head(head->prev)->next) != head))
> 		return -1;
> 
> 	if (rb_check_list(cpu_buffer, head))

rb_check_list() expect to check a page with RB_FLAG being cleared,
but in this solution, rb_head_page_deactivate() is not called before,
so we may not call it directly? The same problem with below check for
"bpage->list".

> 		return -1;
> 
> 	list_for_each_entry_safe(bpage, tmp, head, list) {

I'd like to know if there is a case that "head" happens to be a
"reader_page", and the ring buffer is not exactly being traversed?

> 		if (RB_WARN_ON(cpu_buffer,
> 		     rb_list_head(rb_list_head(bpage->list.next)->prev) != &bpage->list))
> 			return -1;
> 		if (RB_WARN_ON(cpu_buffer,
> 		     rb_list_head(rb_list_head(bpage->list.prev)->next) != &bpage->list))
> 			return -1;
> 		if (rb_check_list(cpu_buffer, &bpage->list))
> 			return -1;
> 	}
> 
> 	return 0;
> }
> 
> I haven't tested the above.
> 
> ?
> 
> -- Steve
> 
> 
>>   
>>   	atomic_dec(&cpu_buffer->resize_disabled);
>>   	kfree(iter->event);
>
  
Mukesh Ojha Feb. 10, 2023, 3:22 p.m. UTC | #3
On 2023/2/9 06:36, Steven Rostedt wrote:
>> On Wed, 8 Feb 2023 17:08:14 +0800
>> Zheng Yejian <zhengyejian1@huawei.com> wrote:
>> 
>>> Concurrently closing "trace" file and writing into ring buffer [1] can
>>> cause WARNINGs [2]. It has been reported in
>>> Link: https://lore.kernel.org/all/20230203035608.2336906-1-zhengyejian1@huawei.com/
>>>
>>> It seems a data race between ring_buffer writing and integrity check.
>>> That is, RB_FLAG of head_page is been updating, while at same time RB_FLAG
>>> was cleared when doing integrity check:
>>>    rb_check_pages()            rb_handle_head_page():
>>>    --------                    --------
>>>    rb_head_page_deactivate()
>>>                                rb_head_page_set_normal()
>>>    rb_head_page_activate()
>>>
>> 
>> Good catch!
>
>Thanks!
>
>> 
>>> Integrity check at end of iter read was added since commit 659f451ff213
>>> ("ring-buffer: Add integrity check at end of iter read"). As it's commit
>>> message said:
>>>    > As reading via an iterator requires disabling the ring buffer, it
>>>    > is a perfect place to have it.
>>> However, since commit 1039221cc278 ("ring-buffer: Do not disable recording
>>> when there is an iterator"), ring buffer was not disabled at that place,
>>> so that integrity check should be removed.
>>>
>>> 1:
>>> ``` read_trace.sh
>>>    while true;
>>>    do
>>>      # the "trace" file is closed after read
>>>      head -1 /sys/kernel/tracing/trace > /dev/null
>>>    done
>>> ```
>>> ``` repro.sh
>>>    sysctl -w kernel.panic_on_warn=1
>>>    # function tracer will writing enough data into ring_buffer
>>>    echo function > /sys/kernel/tracing/current_tracer
>>>    ./read_trace.sh &
>>>    ./read_trace.sh &
>>>    ./read_trace.sh &
>>>    ./read_trace.sh &
>>>    ./read_trace.sh &
>>>    ./read_trace.sh &
>>>    ./read_trace.sh &
>>>    ./read_trace.sh &
>>> ```
>>>
>> 
>> 
>>> Fixes: 1039221cc278 ("ring-buffer: Do not disable recording when there is an iterator")
>>> Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
>>> ---
>>>   kernel/trace/ring_buffer.c | 11 -----------
>>>   1 file changed, 11 deletions(-)
>>>
>>> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
>>> index c366a0a9ddba..34e955bd1e59 100644
>>> --- a/kernel/trace/ring_buffer.c
>>> +++ b/kernel/trace/ring_buffer.c
>>> @@ -5203,17 +5203,6 @@ void
>>>   ring_buffer_read_finish(struct ring_buffer_iter *iter)
>>>   {
>>>   	struct ring_buffer_per_cpu *cpu_buffer = iter->cpu_buffer;
>>> -	unsigned long flags;
>>> -
>>> -	/*
>>> -	 * Ring buffer is disabled from recording, here's a good place
>>> -	 * to check the integrity of the ring buffer.
>>> -	 * Must prevent readers from trying to read, as the check
>>> -	 * clears the HEAD page and readers require it.
>>> -	 */
>>> -	raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
>>> -	rb_check_pages(cpu_buffer);
>>> -	raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
>> 
>> I would rather find a way to make this still work than just removing it.
>
>Yes, we can try to find the way.
>
>> 
>> Perhaps there's no reason to clear the flags, and change rb_check_pages()
>> to mask them out before testing. Something like:
>> 
>> static int rb_check_pages(struct ring_buffer_per_cpu *cpu_buffer)
>> {
>> 	struct list_head *head = cpu_buffer->pages;
>> 	struct buffer_page *bpage, *tmp;
>> 
>> 	if (RB_WARN_ON(cpu_buffer, rb_list_head(rb_list_head(head->next)->prev) != head))
>> 		return -1;
>> 	if (RB_WARN_ON(cpu_buffer, rb_list_head(rb_list_head(head->prev)->next) != head))
>> 		return -1;
>> 
>> 	if (rb_check_list(cpu_buffer, head))
>
>rb_check_list() expect to check a page with RB_FLAG being cleared,
>but in this solution, rb_head_page_deactivate() is not called before,
>so we may not call it directly? The same problem with below check for
>"bpage->list".

Correct.


>> 		return -1;
>> 
>> 	list_for_each_entry_safe(bpage, tmp, head, list) {
>
>I'd like to know if there is a case that "head" happens to be a
>"reader_page", and the ring buffer is not exactly being traversed?

In my issue, i see below callstack and it seem to be spinning inside rb_list_head_clear() as
cpu_buffer->pages has duplicate entry in the list.

-00 |rb_list_head_clear(inline)
-00 |rb_head_page_deactivate(inline)
-00 |rb_check_pages(cpu_buffer = 0xFFFFFF89E0C3B200)
-01 |atomic_try_cmpxchg_acquire(inline)
-01 |queued_spin_lock(inline)
-01 |do_raw_spin_lock_flags(inline)
-01 |__raw_spin_lock_irqsave(inline)
-01 |_raw_spin_lock_irqsave(inline)
-01 |ring_buffer_read_finish(iter = 0xFFFFFF8006FE3780)
-02 |cpumask_next(inline)
-02 |tracing_release(inode = ?, file = 0xFFFFFF8A53A63F00)
-03 |__fput(file = 0xFFFFFF8A53A63F00)
-04 |____fput(work = ?)
-05 |_raw_spin_unlock_irq(inline)
-05 |task_work_run()
-06 |tracehook_notify_resume(inline)
-06 |do_notify_resume(regs = 0xFFFFFFC06ADC8EB0, thread_flags = 67108868)
-07 |prepare_exit_to_user_mode(inline)
-07 |exit_to_user_mode(inline)
-07 |el0_svc(regs = 0xFFFFFFC06ADC8EB0)
-08 |el0t_64_sync_handler(regs = ?)
-09 |el0t_64_sync(asm)

...
..
ffffff80359eeb00 --> Duplicate entry 
ffffff80359ee300
ffffff80359ee180
ffffff80359eeec0
ffffff80359eec00
ffffff80359ee800 -- Tail page
ffffff80359eedc0 -- Head page
ffffff80359ee640
ffffff80359ee080
ffffff80359ee700
ffffff80359ee7c0
ffffff80359eed80
ffffff80359ee900
ffffff80359ee9c0
ffffff80359eea00
ffffff80359eea80
ffffff80359eec80
ffffff80359ee240
ffffff80359ee6c0
ffffff80359ee0c0
ffffff80359ee8c0
ffffff80359ee940
ffffff80359eee00
ffffff80359ee000
ffffff80359eeb00 ---> Duplicate entry 


-Mukesh

>
>> 		if (RB_WARN_ON(cpu_buffer,
>> 		     rb_list_head(rb_list_head(bpage->list.next)->prev) != &bpage->list))
>> 			return -1;
>> 		if (RB_WARN_ON(cpu_buffer,
>> 		     rb_list_head(rb_list_head(bpage->list.prev)->next) != &bpage->list))
>> 			return -1;
>> 		if (rb_check_list(cpu_buffer, &bpage->list))
>> 			return -1;
>> 	}
>> 
>> 	return 0;
>> }
>> 
>> I haven't tested the above.
>> 
>> ?
>> 
>> -- Steve
>> 
>> 
>>>   
>>>   	atomic_dec(&cpu_buffer->resize_disabled);
>>>   	kfree(iter->event);
  
Steven Rostedt Feb. 11, 2023, 4:37 p.m. UTC | #4
On Fri, 10 Feb 2023 20:52:36 +0530
Mukesh Ojha <quic_mojha@quicinc.com> wrote:

> >> 		return -1;
> >> 
> >> 	list_for_each_entry_safe(bpage, tmp, head, list) {  
> >
> >I'd like to know if there is a case that "head" happens to be a
> >"reader_page", and the ring buffer is not exactly being traversed?  

No, the way it works is that the reader page is found by searching for the
head pointer, and then it is set when swapped. Basically, the pseudo code
is:

  reader->next = head_page | HEAD_FLAG
  val = head_page->prev->next
  val &= ~FLAGS
  val |= HEAD_FLAG
  cmpxchg(head_page->prev->next, val, reader)

The HEAD_FLAG is always on the pointer that points to the head page that
gets swapped. This will never point to the reader page, as that would mean
the writer has access to it.

> 
> In my issue, i see below callstack and it seem to be spinning inside rb_list_head_clear() as
> cpu_buffer->pages has duplicate entry in the list.
> 
> -00 |rb_list_head_clear(inline)
> -00 |rb_head_page_deactivate(inline)
> -00 |rb_check_pages(cpu_buffer = 0xFFFFFF89E0C3B200)
> -01 |atomic_try_cmpxchg_acquire(inline)
> -01 |queued_spin_lock(inline)
> -01 |do_raw_spin_lock_flags(inline)
> -01 |__raw_spin_lock_irqsave(inline)
> -01 |_raw_spin_lock_irqsave(inline)
> -01 |ring_buffer_read_finish(iter = 0xFFFFFF8006FE3780)
> -02 |cpumask_next(inline)
> -02 |tracing_release(inode = ?, file = 0xFFFFFF8A53A63F00)
> -03 |__fput(file = 0xFFFFFF8A53A63F00)
> -04 |____fput(work = ?)
> -05 |_raw_spin_unlock_irq(inline)
> -05 |task_work_run()
> -06 |tracehook_notify_resume(inline)
> -06 |do_notify_resume(regs = 0xFFFFFFC06ADC8EB0, thread_flags = 67108868)
> -07 |prepare_exit_to_user_mode(inline)
> -07 |exit_to_user_mode(inline)
> -07 |el0_svc(regs = 0xFFFFFFC06ADC8EB0)
> -08 |el0t_64_sync_handler(regs = ?)
> -09 |el0t_64_sync(asm)
> 
> ...
> ..
> ffffff80359eeb00 --> Duplicate entry 
> ffffff80359ee300
> ffffff80359ee180
> ffffff80359eeec0
> ffffff80359eec00
> ffffff80359ee800 -- Tail page
> ffffff80359eedc0 -- Head page
> ffffff80359ee640
> ffffff80359ee080
> ffffff80359ee700
> ffffff80359ee7c0
> ffffff80359eed80
> ffffff80359ee900
> ffffff80359ee9c0
> ffffff80359eea00
> ffffff80359eea80
> ffffff80359eec80
> ffffff80359ee240
> ffffff80359ee6c0
> ffffff80359ee0c0
> ffffff80359ee8c0
> ffffff80359ee940
> ffffff80359eee00
> ffffff80359ee000
> ffffff80359eeb00 ---> Duplicate entry 

So this is a separate issue where the ring buffer is corrupted?

-- Steve
  
Mukesh Ojha Feb. 14, 2023, 12:14 p.m. UTC | #5
On 2/11/2023 10:07 PM, Steven Rostedt wrote:
> On Fri, 10 Feb 2023 20:52:36 +0530
> Mukesh Ojha <quic_mojha@quicinc.com> wrote:
> 
>>>> 		return -1;
>>>>
>>>> 	list_for_each_entry_safe(bpage, tmp, head, list) {
>>>
>>> I'd like to know if there is a case that "head" happens to be a
>>> "reader_page", and the ring buffer is not exactly being traversed?
> 
> No, the way it works is that the reader page is found by searching for the
> head pointer, and then it is set when swapped. Basically, the pseudo code
> is:
> 
>    reader->next = head_page | HEAD_FLAG
>    val = head_page->prev->next
>    val &= ~FLAGS
>    val |= HEAD_FLAG
>    cmpxchg(head_page->prev->next, val, reader)
> 
> The HEAD_FLAG is always on the pointer that points to the head page that
> gets swapped. This will never point to the reader page, as that would mean
> the writer has access to it.
> 
>>
>> In my issue, i see below callstack and it seem to be spinning inside rb_list_head_clear() as
>> cpu_buffer->pages has duplicate entry in the list.
>>
>> -00 |rb_list_head_clear(inline)
>> -00 |rb_head_page_deactivate(inline)
>> -00 |rb_check_pages(cpu_buffer = 0xFFFFFF89E0C3B200)
>> -01 |atomic_try_cmpxchg_acquire(inline)
>> -01 |queued_spin_lock(inline)
>> -01 |do_raw_spin_lock_flags(inline)
>> -01 |__raw_spin_lock_irqsave(inline)
>> -01 |_raw_spin_lock_irqsave(inline)
>> -01 |ring_buffer_read_finish(iter = 0xFFFFFF8006FE3780)
>> -02 |cpumask_next(inline)
>> -02 |tracing_release(inode = ?, file = 0xFFFFFF8A53A63F00)
>> -03 |__fput(file = 0xFFFFFF8A53A63F00)
>> -04 |____fput(work = ?)
>> -05 |_raw_spin_unlock_irq(inline)
>> -05 |task_work_run()
>> -06 |tracehook_notify_resume(inline)
>> -06 |do_notify_resume(regs = 0xFFFFFFC06ADC8EB0, thread_flags = 67108868)
>> -07 |prepare_exit_to_user_mode(inline)
>> -07 |exit_to_user_mode(inline)
>> -07 |el0_svc(regs = 0xFFFFFFC06ADC8EB0)
>> -08 |el0t_64_sync_handler(regs = ?)
>> -09 |el0t_64_sync(asm)
>>
>> ...
>> ..
>> ffffff80359eeb00 --> Duplicate entry
>> ffffff80359ee300
>> ffffff80359ee180
>> ffffff80359eeec0
>> ffffff80359eec00
>> ffffff80359ee800 -- Tail page
>> ffffff80359eedc0 -- Head page
>> ffffff80359ee640
>> ffffff80359ee080
>> ffffff80359ee700
>> ffffff80359ee7c0
>> ffffff80359eed80
>> ffffff80359ee900
>> ffffff80359ee9c0
>> ffffff80359eea00
>> ffffff80359eea80
>> ffffff80359eec80
>> ffffff80359ee240
>> ffffff80359ee6c0
>> ffffff80359ee0c0
>> ffffff80359ee8c0
>> ffffff80359ee940
>> ffffff80359eee00
>> ffffff80359ee000
>> ffffff80359eeb00 ---> Duplicate entry
> 
> So this is a separate issue where the ring buffer is corrupted?

It looks to be different issue and there also i see similar call stack 
of tracing_release() but in that issue it is looping forever in 
deactivate call due to list corruption.

I am not yet able to root cause a place of corruption as it is 
reproduced only once, will need to check more on this.

For this issue, i have posted at
https://lore.kernel.org/lkml/1676376403-16462-1-git-send-email-quic_mojha@quicinc.com/

-Mukesh

> 
> -- Steve
  

Patch

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index c366a0a9ddba..34e955bd1e59 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -5203,17 +5203,6 @@  void
 ring_buffer_read_finish(struct ring_buffer_iter *iter)
 {
 	struct ring_buffer_per_cpu *cpu_buffer = iter->cpu_buffer;
-	unsigned long flags;
-
-	/*
-	 * Ring buffer is disabled from recording, here's a good place
-	 * to check the integrity of the ring buffer.
-	 * Must prevent readers from trying to read, as the check
-	 * clears the HEAD page and readers require it.
-	 */
-	raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
-	rb_check_pages(cpu_buffer);
-	raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
 
 	atomic_dec(&cpu_buffer->resize_disabled);
 	kfree(iter->event);