[v4,0/4] Combine perf and bpf for fast eval of hw breakpoint conditions

Message ID	20240119001352.9396-1-khuey@kylehuey.com
Headers	Received-SPF: pass (google.com: domain of linux-kernel+bounces-30649-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; From: Kyle Huey <me@kylehuey.com> To: Kyle Huey <khuey@kylehuey.com>, linux-kernel@vger.kernel.org, Andrii Nakryiko <andrii.nakryiko@gmail.com>, Jiri Olsa <jolsa@kernel.org>, Namhyung Kim <namhyung@kernel.org>, Marco Elver <elver@google.com>, Yonghong Song <yonghong.song@linux.dev>, Song Liu <song@kernel.org> Cc: Robert O'Callahan <robert@ocallahan.org>, bpf@vger.kernel.org Subject: [PATCH v4 0/4] Combine perf and bpf for fast eval of hw breakpoint conditions Date: Thu, 18 Jan 2024 16:13:47 -0800 Message-Id: <20240119001352.9396-1-khuey@kylehuey.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-getmail-retrieved-from-mailbox: INBOX
Series	Combine perf and bpf for fast eval of hw breakpoint conditions \| [v4,0/4] Combine perf and bpf for fast eval of hw breakpoint conditions [v4,1/4] perf/bpf: Call bpf handler directly, not through overflow machinery [v4,2/4] perf/bpf: Remove unneeded uses_default_overflow_handler. [v4,3/4] perf/bpf: Allow a bpf program to suppress all sample side effects [v4,4/4] selftest/bpf: Test a perf bpf program that suppresses side effects.

Message ID

20240119001352.9396-1-khuey@kylehuey.com

Headers

Received-SPF: pass (google.com: domain of
 linux-kernel+bounces-30649-ouuuleilei=gmail.com@vger.kernel.org designates
 147.75.80.249 as permitted sender) client-ip=147.75.80.249;
From: Kyle Huey <me@kylehuey.com>
To: Kyle Huey <khuey@kylehuey.com>,
	linux-kernel@vger.kernel.org,
	Andrii Nakryiko <andrii.nakryiko@gmail.com>,
	Jiri Olsa <jolsa@kernel.org>,
	Namhyung Kim <namhyung@kernel.org>,
	Marco Elver <elver@google.com>,
	Yonghong Song <yonghong.song@linux.dev>,
	Song Liu <song@kernel.org>
Cc: Robert O'Callahan <robert@ocallahan.org>,
	bpf@vger.kernel.org
Subject: [PATCH v4 0/4] Combine perf and bpf for fast eval of hw breakpoint
 conditions
Date: Thu, 18 Jan 2024 16:13:47 -0800
Message-Id: <20240119001352.9396-1-khuey@kylehuey.com>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

Series

Combine perf and bpf for fast eval of hw breakpoint conditions |

Message

Kyle Huey Jan. 19, 2024, 12:13 a.m. UTC

  rr, a userspace record and replay debugger[0], replays asynchronous events
such as signals and context switches by essentially[1] setting a breakpoint
at the address where the asynchronous event was delivered during recording
with a condition that the program state matches the state when the event
was delivered.

Currently, rr uses software breakpoints that trap (via ptrace) to the
supervisor, and evaluates the condition from the supervisor. If the
asynchronous event is delivered in a tight loop (thus requiring the
breakpoint condition to be repeatedly evaluated) the overhead can be
immense. A patch to rr that uses hardware breakpoints via perf events with
an attached BPF program to reject breakpoint hits where the condition is
not satisfied reduces rr's replay overhead by 94% on a pathological (but a
real customer-provided, not contrived) rr trace.

The only obstacle to this approach is that while the kernel allows a BPF
program to suppress sample output when a perf event overflows it does not
suppress signalling the perf event fd or sending the perf event's SIGTRAP.
This patch set redesigns __perf_overflow_handler() and
bpf_overflow_handler() so that the former invokes the latter directly when
appropriate rather than through the generic overflow handler machinery,
passes the return code of the BPF program back to __perf_overflow_handler()
to allow it to decide whether to execute the regular overflow handler,
reorders bpf_overflow_handler() and the side effects of perf event
overflow, changes __perf_overflow_handler() to suppress those side effects
if the BPF program returns zero, and adds a selftest.

The previous version of this patchset can be found at
https://lore.kernel.org/linux-kernel/20231211045543.31741-1-khuey@kylehuey.com/

Changes since v3:

Patches 1, 2, 3 added various Acked-by.

Patch 4 addresses Song's review comments by dropping signals_expected and the
corresponding ASSERT_OKs, handling errors from signal(), and fixing multiline
comment formatting.

v2 of this patchset can be found at
https://lore.kernel.org/linux-kernel/20231207163458.5554-1-khuey@kylehuey.com/

Changes since v2:

Patches 1 and 2 were added from a suggestion by Namhyung Kim to refactor
this code to implement this feature in a cleaner way. Patch 2 is separated
for the benefit of the ARM arch maintainers.

Patch 3 conceptually supercedes v2's patches 1 and 2, now with a cleaner
implementation thanks to the earlier refactoring.

Patch 4 is v2's patch 3, and addresses review comments about C++ style
comments, getting a TRAP_PERF definition into the test, and unnecessary
NULL checks.

[0] https://rr-project.org/
[1] Various optimizations exist to skip as much as execution as possible
before setting a breakpoint, and to determine a set of program state that
is practical to check and verify.

Comments

Jiri Olsa Jan. 19, 2024, 11:51 a.m. UTC | #1

On Thu, Jan 18, 2024 at 04:13:47PM -0800, Kyle Huey wrote:
> rr, a userspace record and replay debugger[0], replays asynchronous events
> such as signals and context switches by essentially[1] setting a breakpoint
> at the address where the asynchronous event was delivered during recording
> with a condition that the program state matches the state when the event
> was delivered.
> 
> Currently, rr uses software breakpoints that trap (via ptrace) to the
> supervisor, and evaluates the condition from the supervisor. If the
> asynchronous event is delivered in a tight loop (thus requiring the
> breakpoint condition to be repeatedly evaluated) the overhead can be
> immense. A patch to rr that uses hardware breakpoints via perf events with
> an attached BPF program to reject breakpoint hits where the condition is
> not satisfied reduces rr's replay overhead by 94% on a pathological (but a
> real customer-provided, not contrived) rr trace.
> 
> The only obstacle to this approach is that while the kernel allows a BPF
> program to suppress sample output when a perf event overflows it does not
> suppress signalling the perf event fd or sending the perf event's SIGTRAP.
> This patch set redesigns __perf_overflow_handler() and
> bpf_overflow_handler() so that the former invokes the latter directly when
> appropriate rather than through the generic overflow handler machinery,
> passes the return code of the BPF program back to __perf_overflow_handler()
> to allow it to decide whether to execute the regular overflow handler,
> reorders bpf_overflow_handler() and the side effects of perf event
> overflow, changes __perf_overflow_handler() to suppress those side effects
> if the BPF program returns zero, and adds a selftest.
> 
> The previous version of this patchset can be found at
> https://lore.kernel.org/linux-kernel/20231211045543.31741-1-khuey@kylehuey.com/
> 
> Changes since v3:
> 
> Patches 1, 2, 3 added various Acked-by.
> 
> Patch 4 addresses Song's review comments by dropping signals_expected and the
> corresponding ASSERT_OKs, handling errors from signal(), and fixing multiline
> comment formatting.

Acked-by: Jiri Olsa <jolsa@kernel.org>

jirka

> 
> v2 of this patchset can be found at
> https://lore.kernel.org/linux-kernel/20231207163458.5554-1-khuey@kylehuey.com/
> 
> Changes since v2:
> 
> Patches 1 and 2 were added from a suggestion by Namhyung Kim to refactor
> this code to implement this feature in a cleaner way. Patch 2 is separated
> for the benefit of the ARM arch maintainers.
> 
> Patch 3 conceptually supercedes v2's patches 1 and 2, now with a cleaner
> implementation thanks to the earlier refactoring.
> 
> Patch 4 is v2's patch 3, and addresses review comments about C++ style
> comments, getting a TRAP_PERF definition into the test, and unnecessary
> NULL checks.
> 
> [0] https://rr-project.org/
> [1] Various optimizations exist to skip as much as execution as possible
> before setting a breakpoint, and to determine a set of program state that
> is practical to check and verify.
> 
>