Message ID | 20231207163458.5554-2-khuey@kylehuey.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:bcd1:0:b0:403:3b70:6f57 with SMTP id r17csp4907160vqy; Thu, 7 Dec 2023 08:36:36 -0800 (PST) X-Google-Smtp-Source: AGHT+IGBt1Roz6mNMpcHeoq5NsCIYmtFHl90qzUeGO1g+h/vwA2YiE8rJqiBhlMO9FsmeMhkqhzG X-Received: by 2002:a05:6a20:7343:b0:18c:9705:47cd with SMTP id v3-20020a056a20734300b0018c970547cdmr3186522pzc.52.1701966996695; Thu, 07 Dec 2023 08:36:36 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701966996; cv=none; d=google.com; s=arc-20160816; b=MdiNguu7i77OnfzxVdHdctHrI+n+uaIrXmjX5JldrS4TSsULCHVutUuU+v4zwaI80H Vt+vrN03RqeExA/WXw+sLy3fREwMWsH0ZGWFkGK3K9FliGOSFWO1PT274vhPEXGCbmjQ lRxIiyNF+hIoLVVeCu7/2XGb53lPfRbLMOgGB84fND7CckG5xhWP1SThegvl6UA6q5cj 6d4EtPr3drhRx9wTHgHYI7NMCs6FblU5+OHoiEqCljaYKQMCzjWEG0J+yfrgt0WxPE4V zeIyxKMmTlP4I7y5hR2AOh5FPbCqzugjPEJ295QmvD29j+/XBPXdGYbvb0jSrByih3BD s9hA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=IXhJZgRGOB5Sycf3QWqHrD17iXmXG1N3cpy0e+e8qfo=; fh=T2aUC8Bshg+HXrN4F056u0bzm3RMUPetV/w6+W6ei8w=; b=v35xPqzfkBA2rRHXK2kDn0pQB2bAD55ccV7zlntybybcKqu103rRQYLqjWpl9v7Nt/ vrsfK87zVQj/uSYuMhrtxQ8ryQWTE3xxG4b/zNjbA2qidT7gOq9JQAEQRFIdW/4GxtQq VM8xGaSkD8KkF6P//0sb9dEZdv2QgWMD39AzNznhqMWWIADDimCWnwFvx8Z/jCTMMcGg FFarJCqFIm6kRUIOuz6urzS+/A69qKtbEg7MDGT8w4Y/4sXADqXf4WPE9C29cqM/b1sC Bo3gtZJHNMa9UVwtBt3wLREmLVl9JZ3BVYwQu1fnc7YtDc2u3TuQf6JIdnXcriAC46Y4 hCfA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kylehuey.com header.s=google header.b="DMpiC/Ho"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from howler.vger.email (howler.vger.email. [2620:137:e000::3:4]) by mx.google.com with ESMTPS id r12-20020a6560cc000000b005c69365abb9si1460880pgv.697.2023.12.07.08.36.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Dec 2023 08:36:36 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) client-ip=2620:137:e000::3:4; Authentication-Results: mx.google.com; dkim=pass header.i=@kylehuey.com header.s=google header.b="DMpiC/Ho"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id C7630807F970; Thu, 7 Dec 2023 08:35:23 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232709AbjLGQfK (ORCPT <rfc822;chrisfriedt@gmail.com> + 99 others); Thu, 7 Dec 2023 11:35:10 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46254 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232894AbjLGQfG (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 7 Dec 2023 11:35:06 -0500 Received: from mail-pl1-x630.google.com (mail-pl1-x630.google.com [IPv6:2607:f8b0:4864:20::630]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CAE4610F3 for <linux-kernel@vger.kernel.org>; Thu, 7 Dec 2023 08:35:12 -0800 (PST) Received: by mail-pl1-x630.google.com with SMTP id d9443c01a7336-1d05e4a94c3so9686255ad.1 for <linux-kernel@vger.kernel.org>; Thu, 07 Dec 2023 08:35:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kylehuey.com; s=google; t=1701966912; x=1702571712; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=IXhJZgRGOB5Sycf3QWqHrD17iXmXG1N3cpy0e+e8qfo=; b=DMpiC/Hon0R+3JwN8VGCqbuqzpnpWSu5pqBmcD9KUHVsbS11Mi8u5NBERxWWg/lGEq wwplDbbfBd7ZotHqL9Hg5bQRfd6wrwiak0rirXKR3XiFEcaoYP4EjjynBVYyxq0N1wLR A8EfFRf8m1mZG2VjVsuGib2P1Kx6T69q3Yzi8xvS/rO4eg95ANSkwLA1RVEhyKp8pIao gXbWsYjentZeJ0hY7C3HjXFFTEr4Mn548jUcNbQFpvxrDYNkNH/Za0pFvlYD8cG63SAt G7QwGBh5XBhmGHtgp7Vs38ha4HQHOxbF8QCBHHTfcyyGLN/MZrYEHTJWyQc7/vy9mxr/ GbKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701966912; x=1702571712; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=IXhJZgRGOB5Sycf3QWqHrD17iXmXG1N3cpy0e+e8qfo=; b=YMU99r5wq0O+PkoFP34jy6mL91GTsIq8uUzSQRumHpbDwJNXTE+P0iHQ13q7UbZWEz Z03HZ9+E/jxH1OV9YjyOKA6YxAS1Q1BJe6Eu+5XebK54/4LKIYkyriZFwRsaZDA7EHiE vdZybYcLMjEJ4KZfxvM8/AoGm2TOmV+jqLVFxs7Jpy6jhHQN1EhPPbk7UFmVm2zB+qoJ eQNOHKhyiTpHntHx6heiyDOUFhKekFVAsyEsRXR1tgN1E/3YngQaBMz+CJpkDFiN/l8n 6G8RDAYm8xarN2Lfv1bBkvMnHdWumPuquyYtNzqkNNfhb+qk5SNFXoNuxmiZDPZ/6LO2 Crug== X-Gm-Message-State: AOJu0YwTu7zYOo1NGXqcMPAfSX6PoKsNul2ynHNpkwRS2kEh54mtDIwY JYfli10R2k3jsOJ++zQvihZTQQ== X-Received: by 2002:a17:902:dacf:b0:1cf:c42c:cfbd with SMTP id q15-20020a170902dacf00b001cfc42ccfbdmr3656529plx.0.1701966912115; Thu, 07 Dec 2023 08:35:12 -0800 (PST) Received: from zhadum.home.kylehuey.com (c-76-126-33-191.hsd1.ca.comcast.net. [76.126.33.191]) by smtp.gmail.com with ESMTPSA id iw15-20020a170903044f00b001d1cd7e4acfsm6143plb.201.2023.12.07.08.35.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Dec 2023 08:35:11 -0800 (PST) From: Kyle Huey <me@kylehuey.com> X-Google-Original-From: Kyle Huey <khuey@kylehuey.com> To: Kyle Huey <khuey@kylehuey.com>, linux-kernel@vger.kernel.org, Andrii Nakryiko <andrii.nakryiko@gmail.com>, Jiri Olsa <jolsa@kernel.org>, Namhyung Kim <namhyung@kernel.org>, Marco Elver <elver@google.com>, Yonghong Song <yonghong.song@linux.dev> Cc: Robert O'Callahan <robert@ocallahan.org>, Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@redhat.com>, Arnaldo Carvalho de Melo <acme@kernel.org>, Mark Rutland <mark.rutland@arm.com>, Alexander Shishkin <alexander.shishkin@linux.intel.com>, Ian Rogers <irogers@google.com>, Adrian Hunter <adrian.hunter@intel.com>, linux-perf-users@vger.kernel.org Subject: [PATCH v2 1/3] perf: Reorder overflow handler ahead of event_limit/sigtrap Date: Thu, 7 Dec 2023 08:34:56 -0800 Message-Id: <20231207163458.5554-2-khuey@kylehuey.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231207163458.5554-1-khuey@kylehuey.com> References: <20231207163458.5554-1-khuey@kylehuey.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on howler.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Thu, 07 Dec 2023 08:35:24 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1784641745719401684 X-GMAIL-MSGID: 1784641745719401684 |
Series |
Combine perf and bpf for fast eval of hw breakpoint conditions
|
|
Commit Message
Kyle Huey
Dec. 7, 2023, 4:34 p.m. UTC
The perf subsystem already allows an overflow handler to clear pending_kill
to suppress a fasync signal (although nobody currently does this). Allow an
overflow handler to suppress the other visible side effects of an overflow,
namely event_limit accounting and SIGTRAP generation.
Signed-off-by: Kyle Huey <khuey@kylehuey.com>
---
kernel/events/core.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
Comments
On Thu, 7 Dec 2023 at 17:35, Kyle Huey <me@kylehuey.com> wrote: > > The perf subsystem already allows an overflow handler to clear pending_kill > to suppress a fasync signal (although nobody currently does this). Allow an > overflow handler to suppress the other visible side effects of an overflow, > namely event_limit accounting and SIGTRAP generation. > > Signed-off-by: Kyle Huey <khuey@kylehuey.com> > --- > kernel/events/core.c | 10 +++++++--- > 1 file changed, 7 insertions(+), 3 deletions(-) > > diff --git a/kernel/events/core.c b/kernel/events/core.c > index b704d83a28b2..19fddfc27a4a 100644 > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -9541,6 +9541,12 @@ static int __perf_event_overflow(struct perf_event *event, > */ > > event->pending_kill = POLL_IN; > + > + READ_ONCE(event->overflow_handler)(event, data, regs); > + > + if (!event->pending_kill) > + return ret; It's not at all intuitive that resetting pending_kill to 0 will suppress everything else, too. There is no relationship between the fasync signals and SIGTRAP. pending_kill is for the former and pending_sigtrap is for the latter. One should not affect the other. A nicer solution would be to properly undo the various pending_* states (in the case of pending_sigtrap being set it should be enough to reset pending_sigtrap to 0, and also decrement event->ctx->nr_pending). Although I can see why this solution is simpler. Perhaps with enough comments it might be clearer. Preferences?
On Thu, 7 Dec 2023 at 18:47, Kyle Huey <me@kylehuey.com> wrote: > > > > On Thu, Dec 7, 2023, 9:05 AM Marco Elver <elver@google.com> wrote: >> >> On Thu, 7 Dec 2023 at 17:35, Kyle Huey <me@kylehuey.com> wrote: >> > >> > The perf subsystem already allows an overflow handler to clear pending_kill >> > to suppress a fasync signal (although nobody currently does this). Allow an >> > overflow handler to suppress the other visible side effects of an overflow, >> > namely event_limit accounting and SIGTRAP generation. >> > >> > Signed-off-by: Kyle Huey <khuey@kylehuey.com> >> > --- >> > kernel/events/core.c | 10 +++++++--- >> > 1 file changed, 7 insertions(+), 3 deletions(-) >> > >> > diff --git a/kernel/events/core.c b/kernel/events/core.c >> > index b704d83a28b2..19fddfc27a4a 100644 >> > --- a/kernel/events/core.c >> > +++ b/kernel/events/core.c >> > @@ -9541,6 +9541,12 @@ static int __perf_event_overflow(struct perf_event *event, >> > */ >> > >> > event->pending_kill = POLL_IN; >> > + >> > + READ_ONCE(event->overflow_handler)(event, data, regs); >> > + >> > + if (!event->pending_kill) >> > + return ret; >> >> It's not at all intuitive that resetting pending_kill to 0 will >> suppress everything else, too. There is no relationship between the >> fasync signals and SIGTRAP. pending_kill is for the former and >> pending_sigtrap is for the latter. One should not affect the other. >> >> A nicer solution would be to properly undo the various pending_* >> states (in the case of pending_sigtrap being set it should be enough >> to reset pending_sigtrap to 0, and also decrement >> event->ctx->nr_pending). > > > I don't believe it's possible to correctly undo the event_limit decrement after the fact (if it's e.g. racing with the ioctl that adds to the event limit). > >> Although I can see why this solution is simpler. Perhaps with enough >> comments it might be clearer. >> >> Preferences? > > > The cleanest way would probably be to add a return value to the overflow handler function that controls this. It requires changing a bunch of arch specific code on arches I don't have access to though. Hmm. Maybe wait for perf maintainers to say what is preferrable. (I could live with just making sure this has no other weird side effects and more comments.)
Hello, On Thu, Dec 07, 2023 at 06:53:58PM +0100, Marco Elver wrote: > On Thu, 7 Dec 2023 at 18:47, Kyle Huey <me@kylehuey.com> wrote: > > > > > > > > On Thu, Dec 7, 2023, 9:05 AM Marco Elver <elver@google.com> wrote: > >> > >> On Thu, 7 Dec 2023 at 17:35, Kyle Huey <me@kylehuey.com> wrote: > >> > > >> > The perf subsystem already allows an overflow handler to clear pending_kill > >> > to suppress a fasync signal (although nobody currently does this). Allow an > >> > overflow handler to suppress the other visible side effects of an overflow, > >> > namely event_limit accounting and SIGTRAP generation. Well, I think it can still hit the throttling logic and generate a PERF_RECORD_THROTTLE. But it should be rare.. > >> > > >> > Signed-off-by: Kyle Huey <khuey@kylehuey.com> > >> > --- > >> > kernel/events/core.c | 10 +++++++--- > >> > 1 file changed, 7 insertions(+), 3 deletions(-) > >> > > >> > diff --git a/kernel/events/core.c b/kernel/events/core.c > >> > index b704d83a28b2..19fddfc27a4a 100644 > >> > --- a/kernel/events/core.c > >> > +++ b/kernel/events/core.c > >> > @@ -9541,6 +9541,12 @@ static int __perf_event_overflow(struct perf_event *event, > >> > */ > >> > > >> > event->pending_kill = POLL_IN; > >> > + > >> > + READ_ONCE(event->overflow_handler)(event, data, regs); > >> > + > >> > + if (!event->pending_kill) > >> > + return ret; > >> > >> It's not at all intuitive that resetting pending_kill to 0 will > >> suppress everything else, too. There is no relationship between the > >> fasync signals and SIGTRAP. pending_kill is for the former and > >> pending_sigtrap is for the latter. One should not affect the other. > >> > >> A nicer solution would be to properly undo the various pending_* > >> states (in the case of pending_sigtrap being set it should be enough > >> to reset pending_sigtrap to 0, and also decrement > >> event->ctx->nr_pending). > > > > > > I don't believe it's possible to correctly undo the event_limit decrement after the fact (if it's e.g. racing with the ioctl that adds to the event limit). > > > >> Although I can see why this solution is simpler. Perhaps with enough > >> comments it might be clearer. > >> > >> Preferences? > > > > > > The cleanest way would probably be to add a return value to the overflow handler function that controls this. It requires changing a bunch of arch specific code on arches I don't have access to though. > > Hmm. > > Maybe wait for perf maintainers to say what is preferrable. (I could > live with just making sure this has no other weird side effects and > more comments.) What if we can call bpf handler directly and check the return value? Then I think we can also get rid of the original overflow handler. Something like this (untested..) Thanks, Namhyung ---8<--- diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index e85cd1c0eaf3..1eba6f5bb70b 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -809,7 +809,6 @@ struct perf_event { perf_overflow_handler_t overflow_handler; void *overflow_handler_context; #ifdef CONFIG_BPF_SYSCALL - perf_overflow_handler_t orig_overflow_handler; struct bpf_prog *prog; u64 bpf_cookie; #endif diff --git a/kernel/events/core.c b/kernel/events/core.c index 4c72a41f11af..e1a00646dbbe 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -9471,6 +9471,12 @@ static inline bool sample_is_allowed(struct perf_event *event, struct pt_regs *r * Generic event overflow handling, sampling. */ +#ifdef CONFIG_BPF_SYSCALL +static int bpf_overflow_handler(struct perf_event *event, + struct perf_sample_data *data, + struct pt_regs *regs); +#endif + static int __perf_event_overflow(struct perf_event *event, int throttle, struct perf_sample_data *data, struct pt_regs *regs) @@ -9487,6 +9493,11 @@ static int __perf_event_overflow(struct perf_event *event, ret = __perf_event_account_interrupt(event, throttle); +#ifdef CONFIG_BPF_SYSCALL + if (event->prog && bpf_overflow_handler(event, data, regs) == 0) + return ret; +#endif + /* * XXX event_limit might not quite work as expected on inherited * events @@ -10346,7 +10357,7 @@ static void perf_event_free_filter(struct perf_event *event) } #ifdef CONFIG_BPF_SYSCALL -static void bpf_overflow_handler(struct perf_event *event, +static int bpf_overflow_handler(struct perf_event *event, struct perf_sample_data *data, struct pt_regs *regs) { @@ -10355,7 +10366,7 @@ static void bpf_overflow_handler(struct perf_event *event, .event = event, }; struct bpf_prog *prog; - int ret = 0; + int ret = 1; ctx.regs = perf_arch_bpf_user_pt_regs(regs); if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1)) @@ -10369,10 +10380,7 @@ static void bpf_overflow_handler(struct perf_event *event, rcu_read_unlock(); out: __this_cpu_dec(bpf_prog_active); - if (!ret) - return; - - event->orig_overflow_handler(event, data, regs); + return ret; } static int perf_event_set_bpf_handler(struct perf_event *event, @@ -10408,8 +10416,6 @@ static int perf_event_set_bpf_handler(struct perf_event *event, event->prog = prog; event->bpf_cookie = bpf_cookie; - event->orig_overflow_handler = READ_ONCE(event->overflow_handler); - WRITE_ONCE(event->overflow_handler, bpf_overflow_handler); return 0; } @@ -10420,7 +10426,6 @@ static void perf_event_free_bpf_handler(struct perf_event *event) if (!prog) return; - WRITE_ONCE(event->overflow_handler, event->orig_overflow_handler); event->prog = NULL; bpf_prog_put(prog); } @@ -11880,13 +11885,11 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu, overflow_handler = parent_event->overflow_handler; context = parent_event->overflow_handler_context; #if defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_EVENT_TRACING) - if (overflow_handler == bpf_overflow_handler) { + if (parent_event->prog) { struct bpf_prog *prog = parent_event->prog; bpf_prog_inc(prog); event->prog = prog; - event->orig_overflow_handler = - parent_event->orig_overflow_handler; } #endif }
On Thu, Dec 7, 2023 at 2:38 PM Namhyung Kim <namhyung@kernel.org> wrote: > > Hello, > > On Thu, Dec 07, 2023 at 06:53:58PM +0100, Marco Elver wrote: > > On Thu, 7 Dec 2023 at 18:47, Kyle Huey <me@kylehuey.com> wrote: > > > > > > > > > > > > On Thu, Dec 7, 2023, 9:05 AM Marco Elver <elver@google.com> wrote: > > >> > > >> On Thu, 7 Dec 2023 at 17:35, Kyle Huey <me@kylehuey.com> wrote: > > >> > > > >> > The perf subsystem already allows an overflow handler to clear pending_kill > > >> > to suppress a fasync signal (although nobody currently does this). Allow an > > >> > overflow handler to suppress the other visible side effects of an overflow, > > >> > namely event_limit accounting and SIGTRAP generation. > > Well, I think it can still hit the throttling logic and generate > a PERF_RECORD_THROTTLE. But it should be rare.. > > > >> > > > >> > Signed-off-by: Kyle Huey <khuey@kylehuey.com> > > >> > --- > > >> > kernel/events/core.c | 10 +++++++--- > > >> > 1 file changed, 7 insertions(+), 3 deletions(-) > > >> > > > >> > diff --git a/kernel/events/core.c b/kernel/events/core.c > > >> > index b704d83a28b2..19fddfc27a4a 100644 > > >> > --- a/kernel/events/core.c > > >> > +++ b/kernel/events/core.c > > >> > @@ -9541,6 +9541,12 @@ static int __perf_event_overflow(struct perf_event *event, > > >> > */ > > >> > > > >> > event->pending_kill = POLL_IN; > > >> > + > > >> > + READ_ONCE(event->overflow_handler)(event, data, regs); > > >> > + > > >> > + if (!event->pending_kill) > > >> > + return ret; > > >> > > >> It's not at all intuitive that resetting pending_kill to 0 will > > >> suppress everything else, too. There is no relationship between the > > >> fasync signals and SIGTRAP. pending_kill is for the former and > > >> pending_sigtrap is for the latter. One should not affect the other. > > >> > > >> A nicer solution would be to properly undo the various pending_* > > >> states (in the case of pending_sigtrap being set it should be enough > > >> to reset pending_sigtrap to 0, and also decrement > > >> event->ctx->nr_pending). > > > > > > > > > I don't believe it's possible to correctly undo the event_limit decrement after the fact (if it's e.g. racing with the ioctl that adds to the event limit). > > > > > >> Although I can see why this solution is simpler. Perhaps with enough > > >> comments it might be clearer. > > >> > > >> Preferences? > > > > > > > > > The cleanest way would probably be to add a return value to the overflow handler function that controls this. It requires changing a bunch of arch specific code on arches I don't have access to though. > > > > Hmm. > > > > Maybe wait for perf maintainers to say what is preferrable. (I could > > live with just making sure this has no other weird side effects and > > more comments.) > > What if we can call bpf handler directly and check the return value? > Then I think we can also get rid of the original overflow handler. > > Something like this (untested..) mmm, that's an interesting idea. I'll experiment with it. - Kyle > Thanks, > Namhyung > > > ---8<--- > > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h > index e85cd1c0eaf3..1eba6f5bb70b 100644 > --- a/include/linux/perf_event.h > +++ b/include/linux/perf_event.h > @@ -809,7 +809,6 @@ struct perf_event { > perf_overflow_handler_t overflow_handler; > void *overflow_handler_context; > #ifdef CONFIG_BPF_SYSCALL > - perf_overflow_handler_t orig_overflow_handler; > struct bpf_prog *prog; > u64 bpf_cookie; > #endif > diff --git a/kernel/events/core.c b/kernel/events/core.c > index 4c72a41f11af..e1a00646dbbe 100644 > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -9471,6 +9471,12 @@ static inline bool sample_is_allowed(struct perf_event *event, struct pt_regs *r > * Generic event overflow handling, sampling. > */ > > +#ifdef CONFIG_BPF_SYSCALL > +static int bpf_overflow_handler(struct perf_event *event, > + struct perf_sample_data *data, > + struct pt_regs *regs); > +#endif > + > static int __perf_event_overflow(struct perf_event *event, > int throttle, struct perf_sample_data *data, > struct pt_regs *regs) > @@ -9487,6 +9493,11 @@ static int __perf_event_overflow(struct perf_event *event, > > ret = __perf_event_account_interrupt(event, throttle); > > +#ifdef CONFIG_BPF_SYSCALL > + if (event->prog && bpf_overflow_handler(event, data, regs) == 0) > + return ret; > +#endif > + > /* > * XXX event_limit might not quite work as expected on inherited > * events > @@ -10346,7 +10357,7 @@ static void perf_event_free_filter(struct perf_event *event) > } > > #ifdef CONFIG_BPF_SYSCALL > -static void bpf_overflow_handler(struct perf_event *event, > +static int bpf_overflow_handler(struct perf_event *event, > struct perf_sample_data *data, > struct pt_regs *regs) > { > @@ -10355,7 +10366,7 @@ static void bpf_overflow_handler(struct perf_event *event, > .event = event, > }; > struct bpf_prog *prog; > - int ret = 0; > + int ret = 1; > > ctx.regs = perf_arch_bpf_user_pt_regs(regs); > if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1)) > @@ -10369,10 +10380,7 @@ static void bpf_overflow_handler(struct perf_event *event, > rcu_read_unlock(); > out: > __this_cpu_dec(bpf_prog_active); > - if (!ret) > - return; > - > - event->orig_overflow_handler(event, data, regs); > + return ret; > } > > static int perf_event_set_bpf_handler(struct perf_event *event, > @@ -10408,8 +10416,6 @@ static int perf_event_set_bpf_handler(struct perf_event *event, > > event->prog = prog; > event->bpf_cookie = bpf_cookie; > - event->orig_overflow_handler = READ_ONCE(event->overflow_handler); > - WRITE_ONCE(event->overflow_handler, bpf_overflow_handler); > return 0; > } > > @@ -10420,7 +10426,6 @@ static void perf_event_free_bpf_handler(struct perf_event *event) > if (!prog) > return; > > - WRITE_ONCE(event->overflow_handler, event->orig_overflow_handler); > event->prog = NULL; > bpf_prog_put(prog); > } > @@ -11880,13 +11885,11 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu, > overflow_handler = parent_event->overflow_handler; > context = parent_event->overflow_handler_context; > #if defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_EVENT_TRACING) > - if (overflow_handler == bpf_overflow_handler) { > + if (parent_event->prog) { > struct bpf_prog *prog = parent_event->prog; > > bpf_prog_inc(prog); > event->prog = prog; > - event->orig_overflow_handler = > - parent_event->orig_overflow_handler; > } > #endif > }
diff --git a/kernel/events/core.c b/kernel/events/core.c index b704d83a28b2..19fddfc27a4a 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -9541,6 +9541,12 @@ static int __perf_event_overflow(struct perf_event *event, */ event->pending_kill = POLL_IN; + + READ_ONCE(event->overflow_handler)(event, data, regs); + + if (!event->pending_kill) + return ret; + if (events && atomic_dec_and_test(&event->event_limit)) { ret = 1; event->pending_kill = POLL_HUP; @@ -9584,9 +9590,7 @@ static int __perf_event_overflow(struct perf_event *event, irq_work_queue(&event->pending_irq); } - READ_ONCE(event->overflow_handler)(event, data, regs); - - if (*perf_event_fasync(event) && event->pending_kill) { + if (*perf_event_fasync(event)) { event->pending_wakeup = 1; irq_work_queue(&event->pending_irq); }