Message ID | 20230316123028.2890338-1-elver@google.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:604a:0:0:0:0:0 with SMTP id j10csp457973wrt; Thu, 16 Mar 2023 05:42:18 -0700 (PDT) X-Google-Smtp-Source: AK7set+bVI9Pg1IHZqMMScF8Zl5MTz9lK9CPjs04dgEG5nl6xCGOrpxpftFYdS0/Xc6DAjmSPNa1 X-Received: by 2002:a17:90b:1a91:b0:234:106a:34ab with SMTP id ng17-20020a17090b1a9100b00234106a34abmr4091903pjb.6.1678970537854; Thu, 16 Mar 2023 05:42:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1678970537; cv=none; d=google.com; s=arc-20160816; b=h+hBx95NY1CYNyMZn44FzqWbL0b5zWm3gDuv+0TNbFF1kNSaVQeCGHxewukm6Rlihg gGj4rbDS1J6Yqz+I6nadQTKnxsz4j0aaGmqroC9XR3emnHcjeIdhCEFR9h0NycHcFOOU TbVEZ4m/5+LxasWj3p8YSiWAa6n31m746D5icImGkv06rZDfri1f+v2OjiFb9viC03To JutdJFUMho1nFZiaZt77GMDpbLGr89zPXOo3ISkMiD8ZYqGTFTTFWXMvg3gnBve2INis lEtIFAMdkPpanBGmZfq54d4M30gk8JRmMZsCCJ++6Y1rMMH7tFWuLHXhAQr0ocCyrCMK P3qA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:mime-version:date :dkim-signature; bh=87DZZv3zvWxq1JNrpUvBlJXCIWW+Rra/IVtOi+lv2AY=; b=sEgiZU6tbrIpGJh19sIsJvBAiRtPLDznSe2JJm34Iz0r/udnSdF65njEUFykk8lQAC LnpxdTWuYW722PD/i3hFYl810CcSCeWIbcP6mKZP1FElMxPgYE46SPEv57m+HEMbG9Jr pgQqOOmYPC99MCJKNvMGdpPcL1J9vqkCuj/KWUpHWTZRljXwHKfp6DXgi1XLeB8gm/3W 8B1nA11p/of93GW7cS5FuXxeCDu1zuc+pXwmCdbKDKtez+QAQ9Tb1JToxUEKo4N6/GAr wnJWqmPoJIIuNE9565xi0JrRqvGTw5yoOUTrMxyb2QaJLWoZ3yDNPjQzeduh6m+6llWV Z7RA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=qCtLERy6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g18-20020a635652000000b00508aefbec7asi55626pgm.300.2023.03.16.05.42.04; Thu, 16 Mar 2023 05:42:17 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=qCtLERy6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229913AbjCPMcA (ORCPT <rfc822;ruipengqi7@gmail.com> + 99 others); Thu, 16 Mar 2023 08:32:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53602 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229813AbjCPMb6 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 16 Mar 2023 08:31:58 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 70654C88BC for <linux-kernel@vger.kernel.org>; Thu, 16 Mar 2023 05:31:57 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id z31-20020a25a122000000b00b38d2b9a2e9so1674446ybh.3 for <linux-kernel@vger.kernel.org>; Thu, 16 Mar 2023 05:31:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1678969916; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=87DZZv3zvWxq1JNrpUvBlJXCIWW+Rra/IVtOi+lv2AY=; b=qCtLERy62hEhlxSG/0eouCyDubA5IlOQzhKT2mnHloimO+vyrKei3sI8aJq1Tfj3qo 14b4Ounm+uimNdlEK6Gl1oKiwLH2oXLtNv7TDzmIibqjvnhVsYKbFeHnWom15qxZpv7E 5tOLPt/Rhk6QEnPn8W1CABp2zBMKp0CBPEXm1BE2uMklheDAqZWfiip3/rl6zFZvRNo1 r2Z8izryfAq4Y83tOgyd4a84rYY84JZ4Xsr8I1WQTjjxx5eNPvCps/nAEOnaV8B1vU7D 3V/f1c1bck0wJUie1OxMnbvg8BDG+pUfLdOJT4U31Beb6tiecEyvJAK/k9YvtACNmOeC /+tA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678969916; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=87DZZv3zvWxq1JNrpUvBlJXCIWW+Rra/IVtOi+lv2AY=; b=kSVt5PC6z2IvDeiE9jCpsLzsYs6naqSqrEDNHjSjo39BC77pY1iCambGFVsHeoKqRe S0h8dlTtczKRCaqhzxhpUmG90PIu2dyxUv5bFUuoYsmMcdiJo79xqzFRPaKQl2EvpQAr rhBwqS3u5sK+4gqm8TyQvyU5mIdRhYOgLHS4TWM3xA4268rmbdiQj6NjBfBsC9REogLq wa1h03WDtHMrO7cKpBSM7etK+n8vN4Fwsds28Kb71YjpoM3kXy+15dz2isZc2ZdwT2AP CI6/s3iR9l2wT1Ywz3I4AgwIhHPmLI/0lsZ3T633UqjQhIUj7JnHTn8SDDMZbsati+Ql lI6w== X-Gm-Message-State: AO0yUKX7eoQZ/L74Tmna4zMGhsZN7NQG8GdHvwY61ZSAzTbOdUjx109T Ran9c6dSiubZ1te951W3J3oQsoyp+g== X-Received: from elver.muc.corp.google.com ([2a00:79e0:9c:201:f359:6b95:96e:1317]) (user=elver job=sendgmr) by 2002:a81:b306:0:b0:541:61aa:9e60 with SMTP id r6-20020a81b306000000b0054161aa9e60mr2038578ywh.6.1678969916701; Thu, 16 Mar 2023 05:31:56 -0700 (PDT) Date: Thu, 16 Mar 2023 13:30:27 +0100 Mime-Version: 1.0 X-Mailer: git-send-email 2.40.0.rc1.284.g88254d51c5-goog Message-ID: <20230316123028.2890338-1-elver@google.com> Subject: [PATCH v6 1/2] posix-timers: Prefer delivery of signals to the current thread From: Marco Elver <elver@google.com> To: elver@google.com, Peter Zijlstra <peterz@infradead.org>, Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com>, "Eric W. Biederman" <ebiederm@xmission.com>, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, Dmitry Vyukov <dvyukov@google.com>, kasan-dev@googlegroups.com Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760528210335745352?= X-GMAIL-MSGID: =?utf-8?q?1760528210335745352?= |
Series |
[v6,1/2] posix-timers: Prefer delivery of signals to the current thread
|
|
Commit Message
Marco Elver
March 16, 2023, 12:30 p.m. UTC
From: Dmitry Vyukov <dvyukov@google.com> POSIX timers using the CLOCK_PROCESS_CPUTIME_ID clock prefer the main thread of a thread group for signal delivery. However, this has a significant downside: it requires waking up a potentially idle thread. Instead, prefer to deliver signals to the current thread (in the same thread group) if SIGEV_THREAD_ID is not set by the user. This does not change guaranteed semantics, since POSIX process CPU time timers have never guaranteed that signal delivery is to a specific thread (without SIGEV_THREAD_ID set). The effect is that we no longer wake up potentially idle threads, and the kernel is no longer biased towards delivering the timer signal to any particular thread (which better distributes the timer signals esp. when multiple timers fire concurrently). Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Suggested-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Marco Elver <elver@google.com> --- v6: - Split test from this patch. - Update wording on what this patch aims to improve. v5: - Rebased onto v6.2. v4: - Restructured checks in send_sigqueue() as suggested. v3: - Switched to the completely different implementation (much simpler) based on the Oleg's idea. RFC v2: - Added additional Cc as Thomas asked. --- kernel/signal.c | 25 ++++++++++++++++++++++--- 1 file changed, 22 insertions(+), 3 deletions(-)
Comments
On Thu, 16 Mar 2023 at 13:31, Marco Elver <elver@google.com> wrote: > > From: Dmitry Vyukov <dvyukov@google.com> > > POSIX timers using the CLOCK_PROCESS_CPUTIME_ID clock prefer the main > thread of a thread group for signal delivery. However, this has a > significant downside: it requires waking up a potentially idle thread. > > Instead, prefer to deliver signals to the current thread (in the same > thread group) if SIGEV_THREAD_ID is not set by the user. This does not > change guaranteed semantics, since POSIX process CPU time timers have > never guaranteed that signal delivery is to a specific thread (without > SIGEV_THREAD_ID set). > > The effect is that we no longer wake up potentially idle threads, and > the kernel is no longer biased towards delivering the timer signal to > any particular thread (which better distributes the timer signals esp. > when multiple timers fire concurrently). > > Signed-off-by: Dmitry Vyukov <dvyukov@google.com> > Suggested-by: Oleg Nesterov <oleg@redhat.com> > Reviewed-by: Oleg Nesterov <oleg@redhat.com> > Signed-off-by: Marco Elver <elver@google.com> Gentle ping... Thanks, -- Marco
On Thu, 16 Mar 2023 at 13:31, Marco Elver <elver@google.com> wrote: > > From: Dmitry Vyukov <dvyukov@google.com> > > POSIX timers using the CLOCK_PROCESS_CPUTIME_ID clock prefer the main > thread of a thread group for signal delivery. However, this has a > significant downside: it requires waking up a potentially idle thread. > > Instead, prefer to deliver signals to the current thread (in the same > thread group) if SIGEV_THREAD_ID is not set by the user. This does not > change guaranteed semantics, since POSIX process CPU time timers have > never guaranteed that signal delivery is to a specific thread (without > SIGEV_THREAD_ID set). > > The effect is that we no longer wake up potentially idle threads, and > the kernel is no longer biased towards delivering the timer signal to > any particular thread (which better distributes the timer signals esp. > when multiple timers fire concurrently). > > Signed-off-by: Dmitry Vyukov <dvyukov@google.com> > Suggested-by: Oleg Nesterov <oleg@redhat.com> > Reviewed-by: Oleg Nesterov <oleg@redhat.com> > Signed-off-by: Marco Elver <elver@google.com> > --- > v6: > - Split test from this patch. > - Update wording on what this patch aims to improve. > > v5: > - Rebased onto v6.2. > > v4: > - Restructured checks in send_sigqueue() as suggested. > > v3: > - Switched to the completely different implementation (much simpler) > based on the Oleg's idea. > > RFC v2: > - Added additional Cc as Thomas asked. > --- > kernel/signal.c | 25 ++++++++++++++++++++++--- > 1 file changed, 22 insertions(+), 3 deletions(-) > > diff --git a/kernel/signal.c b/kernel/signal.c > index 8cb28f1df294..605445fa27d4 100644 > --- a/kernel/signal.c > +++ b/kernel/signal.c > @@ -1003,8 +1003,7 @@ static void complete_signal(int sig, struct task_struct *p, enum pid_type type) > /* > * Now find a thread we can wake up to take the signal off the queue. > * > - * If the main thread wants the signal, it gets first crack. > - * Probably the least surprising to the average bear. > + * Try the suggested task first (may or may not be the main thread). > */ > if (wants_signal(sig, p)) > t = p; > @@ -1970,8 +1969,23 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type) > > ret = -1; > rcu_read_lock(); > + /* > + * This function is used by POSIX timers to deliver a timer signal. > + * Where type is PIDTYPE_PID (such as for timers with SIGEV_THREAD_ID > + * set), the signal must be delivered to the specific thread (queues > + * into t->pending). > + * > + * Where type is not PIDTYPE_PID, signals must just be delivered to the > + * current process. In this case, prefer to deliver to current if it is > + * in the same thread group as the target, as it avoids unnecessarily > + * waking up a potentially idle task. > + */ > t = pid_task(pid, type); > - if (!t || !likely(lock_task_sighand(t, &flags))) > + if (!t) > + goto ret; > + if (type != PIDTYPE_PID && same_thread_group(t, current)) > + t = current; > + if (!likely(lock_task_sighand(t, &flags))) > goto ret; > > ret = 1; /* the signal is ignored */ > @@ -1993,6 +2007,11 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type) > q->info.si_overrun = 0; > > signalfd_notify(t, sig); > + /* > + * If the type is not PIDTYPE_PID, we just use shared_pending, which > + * won't guarantee that the specified task will receive the signal, but > + * is sufficient if t==current in the common case. > + */ > pending = (type != PIDTYPE_PID) ? &t->signal->shared_pending : &t->pending; > list_add_tail(&q->list, &pending->list); > sigaddset(&pending->signal, sig); > -- One last semi-gentle ping. ;-) 1. We're seeing that in some applications that use POSIX timers heavily, but where the main thread is mostly idle, the main thread receives a disproportional amount of the signals along with being woken up constantly. This is bad, because the main thread usually waits with the help of a futex or really long sleeps. Now the main thread will steal time (to go back to sleep) from another thread that could have instead just proceeded with whatever it was doing. 2. Delivering signals to random threads is currently way too expensive. We need to resort to this crazy algorithm: 1) receive timer signal, 2) check if main thread, 3) if main thread (which is likely), pick a random thread and do tgkill. To find a random thread, iterate /proc/self/task, but that's just abysmal for various reasons. Other alternatives, like inherited task clock perf events are too expensive as soon as we need to enable/disable the timers (does IPIs), and maintaining O(#threads) timers is just as horrible. This patch solves both the above issues. We acknowledge the unfortunate situation of attributing this patch to one clear subsystem and owner: it straddles into signal delivery and POSIX timers territory, and perhaps some scheduling. The patch itself only touches kernel/signal.c. If anyone has serious objections, please shout (soon'ish). Given the patch has been reviewed by Oleg, and scrutinized by Dmitry and myself, presumably we need to find a tree that currently takes kernel/signal.c patches? Thanks! -- Marco
Le Thu, Apr 06, 2023 at 04:12:04PM +0200, Marco Elver a écrit : > On Thu, 16 Mar 2023 at 13:31, Marco Elver <elver@google.com> wrote: > One last semi-gentle ping. ;-) > > 1. We're seeing that in some applications that use POSIX timers > heavily, but where the main thread is mostly idle, the main thread > receives a disproportional amount of the signals along with being > woken up constantly. This is bad, because the main thread usually > waits with the help of a futex or really long sleeps. Now the main > thread will steal time (to go back to sleep) from another thread that > could have instead just proceeded with whatever it was doing. > > 2. Delivering signals to random threads is currently way too > expensive. We need to resort to this crazy algorithm: 1) receive timer > signal, 2) check if main thread, 3) if main thread (which is likely), > pick a random thread and do tgkill. To find a random thread, iterate > /proc/self/task, but that's just abysmal for various reasons. Other > alternatives, like inherited task clock perf events are too expensive > as soon as we need to enable/disable the timers (does IPIs), and > maintaining O(#threads) timers is just as horrible. > > This patch solves both the above issues. > > We acknowledge the unfortunate situation of attributing this patch to > one clear subsystem and owner: it straddles into signal delivery and > POSIX timers territory, and perhaps some scheduling. The patch itself > only touches kernel/signal.c. > > If anyone has serious objections, please shout (soon'ish). Given the > patch has been reviewed by Oleg, and scrutinized by Dmitry and myself, > presumably we need to find a tree that currently takes kernel/signal.c > patches? > > Thanks! Thanks for the reminder! In the very unlikely case Thomas ignores this before the next merge window, I'll tentatively do a pull request to Linus. Thanks. > > -- Marco
On Thu, Mar 16, 2023 at 01:30:27PM +0100, Marco Elver wrote: > From: Dmitry Vyukov <dvyukov@google.com> > > POSIX timers using the CLOCK_PROCESS_CPUTIME_ID clock prefer the main > thread of a thread group for signal delivery. However, this has a > significant downside: it requires waking up a potentially idle thread. > > Instead, prefer to deliver signals to the current thread (in the same > thread group) if SIGEV_THREAD_ID is not set by the user. This does not > change guaranteed semantics, since POSIX process CPU time timers have > never guaranteed that signal delivery is to a specific thread (without > SIGEV_THREAD_ID set). > > The effect is that we no longer wake up potentially idle threads, and > the kernel is no longer biased towards delivering the timer signal to > any particular thread (which better distributes the timer signals esp. > when multiple timers fire concurrently). > > Signed-off-by: Dmitry Vyukov <dvyukov@google.com> > Suggested-by: Oleg Nesterov <oleg@redhat.com> > Reviewed-by: Oleg Nesterov <oleg@redhat.com> > Signed-off-by: Marco Elver <elver@google.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> > --- > kernel/signal.c | 25 ++++++++++++++++++++++--- > 1 file changed, 22 insertions(+), 3 deletions(-) > > diff --git a/kernel/signal.c b/kernel/signal.c > index 8cb28f1df294..605445fa27d4 100644 > --- a/kernel/signal.c > +++ b/kernel/signal.c > @@ -1003,8 +1003,7 @@ static void complete_signal(int sig, struct task_struct *p, enum pid_type type) > /* > * Now find a thread we can wake up to take the signal off the queue. > * > - * If the main thread wants the signal, it gets first crack. > - * Probably the least surprising to the average bear. > + * Try the suggested task first (may or may not be the main thread). > */ > if (wants_signal(sig, p)) > t = p; > @@ -1970,8 +1969,23 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type) > > ret = -1; > rcu_read_lock(); > + /* > + * This function is used by POSIX timers to deliver a timer signal. > + * Where type is PIDTYPE_PID (such as for timers with SIGEV_THREAD_ID > + * set), the signal must be delivered to the specific thread (queues > + * into t->pending). > + * > + * Where type is not PIDTYPE_PID, signals must just be delivered to the > + * current process. In this case, prefer to deliver to current if it is > + * in the same thread group as the target, as it avoids unnecessarily > + * waking up a potentially idle task. > + */ > t = pid_task(pid, type); > - if (!t || !likely(lock_task_sighand(t, &flags))) > + if (!t) > + goto ret; > + if (type != PIDTYPE_PID && same_thread_group(t, current)) > + t = current; > + if (!likely(lock_task_sighand(t, &flags))) > goto ret; > > ret = 1; /* the signal is ignored */ > @@ -1993,6 +2007,11 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type) > q->info.si_overrun = 0; > > signalfd_notify(t, sig); > + /* > + * If the type is not PIDTYPE_PID, we just use shared_pending, which > + * won't guarantee that the specified task will receive the signal, but > + * is sufficient if t==current in the common case. > + */ > pending = (type != PIDTYPE_PID) ? &t->signal->shared_pending : &t->pending; > list_add_tail(&q->list, &pending->list); > sigaddset(&pending->signal, sig); > -- > 2.40.0.rc1.284.g88254d51c5-goog >
diff --git a/kernel/signal.c b/kernel/signal.c index 8cb28f1df294..605445fa27d4 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -1003,8 +1003,7 @@ static void complete_signal(int sig, struct task_struct *p, enum pid_type type) /* * Now find a thread we can wake up to take the signal off the queue. * - * If the main thread wants the signal, it gets first crack. - * Probably the least surprising to the average bear. + * Try the suggested task first (may or may not be the main thread). */ if (wants_signal(sig, p)) t = p; @@ -1970,8 +1969,23 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type) ret = -1; rcu_read_lock(); + /* + * This function is used by POSIX timers to deliver a timer signal. + * Where type is PIDTYPE_PID (such as for timers with SIGEV_THREAD_ID + * set), the signal must be delivered to the specific thread (queues + * into t->pending). + * + * Where type is not PIDTYPE_PID, signals must just be delivered to the + * current process. In this case, prefer to deliver to current if it is + * in the same thread group as the target, as it avoids unnecessarily + * waking up a potentially idle task. + */ t = pid_task(pid, type); - if (!t || !likely(lock_task_sighand(t, &flags))) + if (!t) + goto ret; + if (type != PIDTYPE_PID && same_thread_group(t, current)) + t = current; + if (!likely(lock_task_sighand(t, &flags))) goto ret; ret = 1; /* the signal is ignored */ @@ -1993,6 +2007,11 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type) q->info.si_overrun = 0; signalfd_notify(t, sig); + /* + * If the type is not PIDTYPE_PID, we just use shared_pending, which + * won't guarantee that the specified task will receive the signal, but + * is sufficient if t==current in the common case. + */ pending = (type != PIDTYPE_PID) ? &t->signal->shared_pending : &t->pending; list_add_tail(&q->list, &pending->list); sigaddset(&pending->signal, sig);