Message ID | 20231209170139.33c1b452@gandalf.local.home |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:bcd1:0:b0:403:3b70:6f57 with SMTP id r17csp6255521vqy; Sat, 9 Dec 2023 14:01:20 -0800 (PST) X-Google-Smtp-Source: AGHT+IGV7kTftGQqV9grvILsuBPJrq5ZrlGxuTL4WzCEt2rl9znC1gaSQy7b6phFDuYcTz7TgXXv X-Received: by 2002:a05:6358:91a8:b0:16b:c401:e714 with SMTP id j40-20020a05635891a800b0016bc401e714mr3738012rwa.5.1702159280128; Sat, 09 Dec 2023 14:01:20 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702159280; cv=none; d=google.com; s=arc-20160816; b=SsxmAM0xg56scwoCy5KEAGzdOnpBYgltHbz6/2MR2+BaC9cAw61TYSBKUyqOPJdPfw 1PyDBNA8CWlcD+vh436uoYUH461ntMKrAefJlFujyCVE8nC3C6kdkRHa1X7oZZ/KZi8H gg27AJRLHlzYzm//qQ5DGCsBe+K2uEORITDPZFs8zU+gmo8iuiko+JSqdV9llfL0uGL9 vfiV4oJ7Mh8o9IWaZ4RyYG4RtpDSeJg0LnKa8yqTqkLQPLHBUHqrTj9zSBihkU7FxrrS 0Hv24j1z3N7JAw4QW2MERDZ0RrPr2yklRDBLf+P1dFhcCTvRVr1tme2bYVaeM6UV0XYD 6TDg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:subject:cc:to:from:date; bh=BQU4qsa4ObN+tZJ+cvJa6jmzv18t/TvZhMdfqbunU5w=; fh=9ZHKc6QFe4ebwFfm0H61cjjvXLY1pHPLv5pJGQloVEo=; b=DQSEBuHifSKKnUrrI4ytIDfqilu9/GK1xuNwnbOI61VOmjCsx+qiPEasanVAigXhF3 PAhHl3L3bQdL/GGFCDgBdkK+aPGeZVD+3znNLpLvLQeZz46shBGEw3roqX9PV33FbFM6 xavoDbDqz6Oa1K+EhKKzIhh5aRL05X86dFsfadSz0nJJqCV2fu56r+e3HoAx+AdnwW6Q +H5hQTQ4TT8iPOUHribqht+aTUW0kqMoV5vEGOXsFWwa6Ts90vMDwhaS1y4OicM0Amw1 1zP6SDPzjoRQDEkEs0PitoAXOOJjH7jD4Duof6F85z00+hd9eWZeIMAQGs+0pPJH1o73 6YCg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from pete.vger.email (pete.vger.email. [2620:137:e000::3:6]) by mx.google.com with ESMTPS id h2-20020a170902f7c200b001d0b693ae03si3555755plw.500.2023.12.09.14.01.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 09 Dec 2023 14:01:20 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) client-ip=2620:137:e000::3:6; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by pete.vger.email (Postfix) with ESMTP id ED6E180AF25C; Sat, 9 Dec 2023 14:01:11 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at pete.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229741AbjLIWBC (ORCPT <rfc822;kartikey406@gmail.com> + 99 others); Sat, 9 Dec 2023 17:01:02 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53538 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229509AbjLIWBA (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Sat, 9 Dec 2023 17:01:00 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B4D52D9 for <linux-kernel@vger.kernel.org>; Sat, 9 Dec 2023 14:01:05 -0800 (PST) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7D161C433C7; Sat, 9 Dec 2023 22:01:04 +0000 (UTC) Date: Sat, 9 Dec 2023 17:01:39 -0500 From: Steven Rostedt <rostedt@goodmis.org> To: LKML <linux-kernel@vger.kernel.org>, Linux Trace Kernel <linux-trace-kernel@vger.kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org>, Mark Rutland <mark.rutland@arm.com>, Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, Kent Overstreet <kent.overstreet@linux.dev> Subject: [PATCH] ring-buffer: Fix buffer max_data_size with max_event_size Message-ID: <20231209170139.33c1b452@gandalf.local.home> X-Mailer: Claws Mail 3.19.1 (GTK+ 2.24.33; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on pete.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (pete.vger.email [0.0.0.0]); Sat, 09 Dec 2023 14:01:12 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1784843368980197613 X-GMAIL-MSGID: 1784843368980197613 |
Series |
ring-buffer: Fix buffer max_data_size with max_event_size
|
|
Commit Message
Steven Rostedt
Dec. 9, 2023, 10:01 p.m. UTC
From: "Steven Rostedt (Google)" <rostedt@goodmis.org> The maximum ring buffer data size is the maximum size of data that can be recorded on the ring buffer. Events must be smaller than the sub buffer data size minus any meta data. This size is checked before trying to allocate from the ring buffer because the allocation assumes that the size will fit on the sub buffer. The maximum size was calculated as the size of a sub buffer page (which is currently PAGE_SIZE minus the sub buffer header) minus the size of the meta data of an individual event. But it missed the possible adding of a time stamp for events that are added long enough apart that the event meta data can't hold the time delta. When an event is added that is greater than the current BUF_MAX_DATA_SIZE minus the size of a time stamp, but still less than or equal to BUF_MAX_DATA_SIZE, the ring buffer would go into an infinite loop, looking for a page that can hold the event. Luckily, there's a check for this loop and after 1000 iterations and a warning is emitted and the ring buffer is disabled. But this should never happen. This can happen when a large event is added first, or after a long period where an absolute timestamp is prefixed to the event, increasing its size by 8 bytes. This passes the check and then goes into the algorithm that causes the infinite loop. Fix this by creating a BUF_MAX_EVENT_SIZE to be used to determine if the passed in event is too big for the buffer. Reported-by: Kent Overstreet <kent.overstreet@linux.dev> # (on IRC) Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> --- kernel/trace/ring_buffer.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
Comments
On Sat, 9 Dec 2023 17:01:39 -0500 Steven Rostedt <rostedt@goodmis.org> wrote: > From: "Steven Rostedt (Google)" <rostedt@goodmis.org> > > The maximum ring buffer data size is the maximum size of data that can be > recorded on the ring buffer. Events must be smaller than the sub buffer > data size minus any meta data. This size is checked before trying to > allocate from the ring buffer because the allocation assumes that the size > will fit on the sub buffer. > > The maximum size was calculated as the size of a sub buffer page (which is > currently PAGE_SIZE minus the sub buffer header) minus the size of the > meta data of an individual event. But it missed the possible adding of a > time stamp for events that are added long enough apart that the event meta > data can't hold the time delta. > > When an event is added that is greater than the current BUF_MAX_DATA_SIZE > minus the size of a time stamp, but still less than or equal to > BUF_MAX_DATA_SIZE, the ring buffer would go into an infinite loop, looking > for a page that can hold the event. Luckily, there's a check for this loop > and after 1000 iterations and a warning is emitted and the ring buffer is > disabled. But this should never happen. > > This can happen when a large event is added first, or after a long period > where an absolute timestamp is prefixed to the event, increasing its size > by 8 bytes. This passes the check and then goes into the algorithm that > causes the infinite loop. > > Fix this by creating a BUF_MAX_EVENT_SIZE to be used to determine if the > passed in event is too big for the buffer. > Forgot to add: Cc: stable@vger.kernel.org Fixes: a4543a2fa9ef3 ("ring-buffer: Get timestamp after event is allocated") -- Steve > Reported-by: Kent Overstreet <kent.overstreet@linux.dev> # (on IRC) > Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> > --- > kernel/trace/ring_buffer.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c > index 8d2a4f00eca9..a38e5a3c6803 100644 > --- a/kernel/trace/ring_buffer.c > +++ b/kernel/trace/ring_buffer.c > @@ -378,6 +378,9 @@ static inline bool test_time_stamp(u64 delta) > /* Max payload is BUF_PAGE_SIZE - header (8bytes) */ > #define BUF_MAX_DATA_SIZE (BUF_PAGE_SIZE - (sizeof(u32) * 2)) > > +/* Events may have a time stamp attached to them */ > +#define BUF_MAX_EVENT_SIZE (BUF_MAX_DATA_SIZE - RB_LEN_TIME_EXTEND) > + > int ring_buffer_print_page_header(struct trace_seq *s) > { > struct buffer_data_page field; > @@ -3810,7 +3813,7 @@ ring_buffer_lock_reserve(struct trace_buffer *buffer, unsigned long length) > if (unlikely(atomic_read(&cpu_buffer->record_disabled))) > goto out; > > - if (unlikely(length > BUF_MAX_DATA_SIZE)) > + if (unlikely(length > BUF_MAX_EVENT_SIZE)) > goto out; > > if (unlikely(trace_recursive_lock(cpu_buffer))) > @@ -3960,7 +3963,7 @@ int ring_buffer_write(struct trace_buffer *buffer, > if (atomic_read(&cpu_buffer->record_disabled)) > goto out; > > - if (length > BUF_MAX_DATA_SIZE) > + if (length > BUF_MAX_EVENT_SIZE) > goto out; > > if (unlikely(trace_recursive_lock(cpu_buffer)))
On Sat, 9 Dec 2023 17:09:25 -0500 Steven Rostedt <rostedt@goodmis.org> wrote: > On Sat, 9 Dec 2023 17:01:39 -0500 > Steven Rostedt <rostedt@goodmis.org> wrote: > > > From: "Steven Rostedt (Google)" <rostedt@goodmis.org> > > > > The maximum ring buffer data size is the maximum size of data that can be > > recorded on the ring buffer. Events must be smaller than the sub buffer > > data size minus any meta data. This size is checked before trying to > > allocate from the ring buffer because the allocation assumes that the size > > will fit on the sub buffer. > > > > The maximum size was calculated as the size of a sub buffer page (which is > > currently PAGE_SIZE minus the sub buffer header) minus the size of the > > meta data of an individual event. But it missed the possible adding of a > > time stamp for events that are added long enough apart that the event meta > > data can't hold the time delta. > > > > When an event is added that is greater than the current BUF_MAX_DATA_SIZE > > minus the size of a time stamp, but still less than or equal to > > BUF_MAX_DATA_SIZE, the ring buffer would go into an infinite loop, looking > > for a page that can hold the event. Luckily, there's a check for this loop > > and after 1000 iterations and a warning is emitted and the ring buffer is > > disabled. But this should never happen. > > > > This can happen when a large event is added first, or after a long period > > where an absolute timestamp is prefixed to the event, increasing its size > > by 8 bytes. This passes the check and then goes into the algorithm that > > causes the infinite loop. > > > > Fix this by creating a BUF_MAX_EVENT_SIZE to be used to determine if the > > passed in event is too big for the buffer. > > > > Forgot to add: > > Cc: stable@vger.kernel.org > Fixes: a4543a2fa9ef3 ("ring-buffer: Get timestamp after event is allocated") Looks good to me. Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Thanks, > > -- Steve > > > > Reported-by: Kent Overstreet <kent.overstreet@linux.dev> # (on IRC) > > Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> > > --- > > kernel/trace/ring_buffer.c | 7 +++++-- > > 1 file changed, 5 insertions(+), 2 deletions(-) > > > > diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c > > index 8d2a4f00eca9..a38e5a3c6803 100644 > > --- a/kernel/trace/ring_buffer.c > > +++ b/kernel/trace/ring_buffer.c > > @@ -378,6 +378,9 @@ static inline bool test_time_stamp(u64 delta) > > /* Max payload is BUF_PAGE_SIZE - header (8bytes) */ > > #define BUF_MAX_DATA_SIZE (BUF_PAGE_SIZE - (sizeof(u32) * 2)) > > > > +/* Events may have a time stamp attached to them */ > > +#define BUF_MAX_EVENT_SIZE (BUF_MAX_DATA_SIZE - RB_LEN_TIME_EXTEND) > > + > > int ring_buffer_print_page_header(struct trace_seq *s) > > { > > struct buffer_data_page field; > > @@ -3810,7 +3813,7 @@ ring_buffer_lock_reserve(struct trace_buffer *buffer, unsigned long length) > > if (unlikely(atomic_read(&cpu_buffer->record_disabled))) > > goto out; > > > > - if (unlikely(length > BUF_MAX_DATA_SIZE)) > > + if (unlikely(length > BUF_MAX_EVENT_SIZE)) > > goto out; > > > > if (unlikely(trace_recursive_lock(cpu_buffer))) > > @@ -3960,7 +3963,7 @@ int ring_buffer_write(struct trace_buffer *buffer, > > if (atomic_read(&cpu_buffer->record_disabled)) > > goto out; > > > > - if (length > BUF_MAX_DATA_SIZE) > > + if (length > BUF_MAX_EVENT_SIZE) > > goto out; > > > > if (unlikely(trace_recursive_lock(cpu_buffer))) >
On Mon, 11 Dec 2023 20:40:33 +0900 Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote: > On Sat, 9 Dec 2023 17:09:25 -0500 > Steven Rostedt <rostedt@goodmis.org> wrote: > > > On Sat, 9 Dec 2023 17:01:39 -0500 > > Steven Rostedt <rostedt@goodmis.org> wrote: > > > > > From: "Steven Rostedt (Google)" <rostedt@goodmis.org> > > > > > > The maximum ring buffer data size is the maximum size of data that can be > > > recorded on the ring buffer. Events must be smaller than the sub buffer > > > data size minus any meta data. This size is checked before trying to > > > allocate from the ring buffer because the allocation assumes that the size > > > will fit on the sub buffer. > > > > > > The maximum size was calculated as the size of a sub buffer page (which is > > > currently PAGE_SIZE minus the sub buffer header) minus the size of the > > > meta data of an individual event. But it missed the possible adding of a > > > time stamp for events that are added long enough apart that the event meta > > > data can't hold the time delta. > > > > > > When an event is added that is greater than the current BUF_MAX_DATA_SIZE > > > minus the size of a time stamp, but still less than or equal to > > > BUF_MAX_DATA_SIZE, the ring buffer would go into an infinite loop, looking > > > for a page that can hold the event. Luckily, there's a check for this loop > > > and after 1000 iterations and a warning is emitted and the ring buffer is > > > disabled. But this should never happen. > > > > > > This can happen when a large event is added first, or after a long period > > > where an absolute timestamp is prefixed to the event, increasing its size > > > by 8 bytes. This passes the check and then goes into the algorithm that > > > causes the infinite loop. > > > > > > Fix this by creating a BUF_MAX_EVENT_SIZE to be used to determine if the > > > passed in event is too big for the buffer. > > > > > > > Forgot to add: > > > > Cc: stable@vger.kernel.org > > Fixes: a4543a2fa9ef3 ("ring-buffer: Get timestamp after event is allocated") > > Looks good to me. > > Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Actually, I found out that this can be fixed with: diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 86a60a0eb279..aaad104a1707 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -3594,7 +3594,7 @@ __rb_reserve_next(struct ring_buffer_per_cpu *cpu_buffer, info->length += RB_LEN_TIME_EXTEND; } else { info->delta = info->ts - info->after; - if (unlikely(test_time_stamp(info->delta))) { + if (w && unlikely(test_time_stamp(info->delta))) { info->add_timestamp |= RB_ADD_STAMP_EXTEND; info->length += RB_LEN_TIME_EXTEND; } The bug that this patch fixed was that the size that was acceptable did not take into account the added time stamp. But the time stamp should *not* be added if it's the first event on the sub buffer. And once it goes to the next buffer, it should be the first event. I ran the same tests with this change, and it works just as well. I believe this is the proper fix. I'll send a v2. Thanks! -- Steve > > Thanks, > > > > -- Steve > > > > > > > Reported-by: Kent Overstreet <kent.overstreet@linux.dev> # (on IRC) > > > Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> > > > --- > > > kernel/trace/ring_buffer.c | 7 +++++-- > > > 1 file changed, 5 insertions(+), 2 deletions(-) > > > > > > diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c > > > index 8d2a4f00eca9..a38e5a3c6803 100644 > > > --- a/kernel/trace/ring_buffer.c > > > +++ b/kernel/trace/ring_buffer.c > > > @@ -378,6 +378,9 @@ static inline bool test_time_stamp(u64 delta) > > > /* Max payload is BUF_PAGE_SIZE - header (8bytes) */ > > > #define BUF_MAX_DATA_SIZE (BUF_PAGE_SIZE - (sizeof(u32) * 2)) > > > > > > +/* Events may have a time stamp attached to them */ > > > +#define BUF_MAX_EVENT_SIZE (BUF_MAX_DATA_SIZE - RB_LEN_TIME_EXTEND) > > > + > > > int ring_buffer_print_page_header(struct trace_seq *s) > > > { > > > struct buffer_data_page field; > > > @@ -3810,7 +3813,7 @@ ring_buffer_lock_reserve(struct trace_buffer *buffer, unsigned long length) > > > if (unlikely(atomic_read(&cpu_buffer->record_disabled))) > > > goto out; > > > > > > - if (unlikely(length > BUF_MAX_DATA_SIZE)) > > > + if (unlikely(length > BUF_MAX_EVENT_SIZE)) > > > goto out; > > > > > > if (unlikely(trace_recursive_lock(cpu_buffer))) > > > @@ -3960,7 +3963,7 @@ int ring_buffer_write(struct trace_buffer *buffer, > > > if (atomic_read(&cpu_buffer->record_disabled)) > > > goto out; > > > > > > - if (length > BUF_MAX_DATA_SIZE) > > > + if (length > BUF_MAX_EVENT_SIZE) > > > goto out; > > > > > > if (unlikely(trace_recursive_lock(cpu_buffer))) > > > >
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 8d2a4f00eca9..a38e5a3c6803 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -378,6 +378,9 @@ static inline bool test_time_stamp(u64 delta) /* Max payload is BUF_PAGE_SIZE - header (8bytes) */ #define BUF_MAX_DATA_SIZE (BUF_PAGE_SIZE - (sizeof(u32) * 2)) +/* Events may have a time stamp attached to them */ +#define BUF_MAX_EVENT_SIZE (BUF_MAX_DATA_SIZE - RB_LEN_TIME_EXTEND) + int ring_buffer_print_page_header(struct trace_seq *s) { struct buffer_data_page field; @@ -3810,7 +3813,7 @@ ring_buffer_lock_reserve(struct trace_buffer *buffer, unsigned long length) if (unlikely(atomic_read(&cpu_buffer->record_disabled))) goto out; - if (unlikely(length > BUF_MAX_DATA_SIZE)) + if (unlikely(length > BUF_MAX_EVENT_SIZE)) goto out; if (unlikely(trace_recursive_lock(cpu_buffer))) @@ -3960,7 +3963,7 @@ int ring_buffer_write(struct trace_buffer *buffer, if (atomic_read(&cpu_buffer->record_disabled)) goto out; - if (length > BUF_MAX_DATA_SIZE) + if (length > BUF_MAX_EVENT_SIZE) goto out; if (unlikely(trace_recursive_lock(cpu_buffer)))