Message ID | 20230425114307.36889-3-wander@redhat.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp3342903vqo; Tue, 25 Apr 2023 04:46:57 -0700 (PDT) X-Google-Smtp-Source: AKy350b6CfVH/EIA2jxwCOiT4t9wTWjG7vajUYSuzTCXfeAN4tadlvN9mL7q1m8eQR4kbTBqXvRv X-Received: by 2002:a17:90a:e509:b0:247:1e1e:57c0 with SMTP id t9-20020a17090ae50900b002471e1e57c0mr16746683pjy.14.1682423216940; Tue, 25 Apr 2023 04:46:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1682423216; cv=none; d=google.com; s=arc-20160816; b=zw2P2V4zDS0Ow+j0XiYSiJq2wR2umsaasGvp9J9UEJt0NU6TnpQ6cOFUG0nPIoKaem 8GuxzCa7fMgmcuLX9W/Yo0xQesFLU96Q7J3Cs7U1+EpUUcj6VrTvhwZLMf6qK3FmJ4Xo T/cYmCyDVKeWCU2e0WXmnFP7+cPE6vuFihxIppwg/mPlISZJZVeX2jWBembHbT1nUhs+ wkGZFbe+mCnA91YUKl+icwE/9tZuw2R5qIMNFJaIGqEc/AfGw188fWS7yk5m3hB7VzO6 +PLYVWExos3YEAcu4P91RdftnRPc86XiSzffiDFOOZxB2436V6chLjs07uOXu9REaASF ddpQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=U265zX8tq5WfRp+xioeBIGxCCS4jZo0GQceEaEesDaM=; b=eK1rOOeZNCIZar/QDgD/NxCgB9CfCznJsWBXD8Atyx1PqHVgim6g82iZUfH3uCEuEz EhwPPZuk6X+ZpsrVOYRCaO1Vl1gxFP8cVRR8Mh3hB7A0N8SQ+ccZHReWLtq8ZQfAMAxv Wx9i+WC41lhKWfhsm4h25Z3p+moJoDVzILE/MDYDXAq+DTZow4tGXZEaNDXz+qpvK0HT 3gAetE2VUv7MGVXQRgNdDPzigH1veqK6USW70gdJ+orNT093NNZDQoLViPQ/4ioSWzNS q+jemlKd0OMbr3PPK0ZunWHYp1s2vsk/d3e1MqfqHPAVua+5im7OM8HI9+AY7mjTJsPl Ga4Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=i9ApcuX9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id t6-20020a654b86000000b00514314780easi13884031pgq.34.2023.04.25.04.46.41; Tue, 25 Apr 2023 04:46:56 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=i9ApcuX9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233932AbjDYLpB (ORCPT <rfc822;zxc52fgh@gmail.com> + 99 others); Tue, 25 Apr 2023 07:45:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45254 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233774AbjDYLo7 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 25 Apr 2023 07:44:59 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6E97B49E2 for <linux-kernel@vger.kernel.org>; Tue, 25 Apr 2023 04:44:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1682423039; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=U265zX8tq5WfRp+xioeBIGxCCS4jZo0GQceEaEesDaM=; b=i9ApcuX9AGzIohD4pz/ye8b+e3GNVllF/7HCx1kL8mTywDFJEE5ASclP6yb+AzEBebZ9DJ ZoovtnYnNJ2Lq76Eik8VAkhDj6c1XdLGvzeYWTxNZyJjWu6iyQHaItMIt8P0khMpFtHiKn s6h0uoirtZWFHLqrs6id/odVkx8o8AE= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-536-Wt_spITUMDODh06iTe24Og-1; Tue, 25 Apr 2023 07:43:54 -0400 X-MC-Unique: Wt_spITUMDODh06iTe24Og-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 4A6DE3C11783; Tue, 25 Apr 2023 11:43:53 +0000 (UTC) Received: from fedora.redhat.com (unknown [10.22.32.181]) by smtp.corp.redhat.com (Postfix) with ESMTP id C629DC15BA0; Tue, 25 Apr 2023 11:43:45 +0000 (UTC) From: Wander Lairson Costa <wander@redhat.com> To: Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@redhat.com>, Arnaldo Carvalho de Melo <acme@kernel.org>, Mark Rutland <mark.rutland@arm.com>, Alexander Shishkin <alexander.shishkin@linux.intel.com>, Jiri Olsa <jolsa@kernel.org>, Namhyung Kim <namhyung@kernel.org>, Ian Rogers <irogers@google.com>, Adrian Hunter <adrian.hunter@intel.com>, Will Deacon <will@kernel.org>, Waiman Long <longman@redhat.com>, Boqun Feng <boqun.feng@gmail.com>, Juri Lelli <juri.lelli@redhat.com>, Vincent Guittot <vincent.guittot@linaro.org>, Dietmar Eggemann <dietmar.eggemann@arm.com>, Steven Rostedt <rostedt@goodmis.org>, Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>, Daniel Bristot de Oliveira <bristot@redhat.com>, Valentin Schneider <vschneid@redhat.com>, "Eric W. Biederman" <ebiederm@xmission.com>, Wander Lairson Costa <wander@redhat.com>, Oleg Nesterov <oleg@redhat.com>, Brian Cain <bcain@quicinc.com>, Kefeng Wang <wangkefeng.wang@huawei.com>, Andrew Morton <akpm@linux-foundation.org>, "Liam R. Howlett" <Liam.Howlett@oracle.com>, Vlastimil Babka <vbabka@suse.cz>, Christian Brauner <brauner@kernel.org>, Andrei Vagin <avagin@gmail.com>, Shakeel Butt <shakeelb@google.com>, linux-kernel@vger.kernel.org (open list), linux-perf-users@vger.kernel.org (open list:PERFORMANCE EVENTS SUBSYSTEM) Cc: Hu Chunyu <chuhu@redhat.com>, Paul McKenney <paulmck@kernel.org>, Thomas Gleixner <tglx@linutronix.de> Subject: [PATCH v7 2/3] sched/task: Add the put_task_struct_atomic_safe() function Date: Tue, 25 Apr 2023 08:43:02 -0300 Message-Id: <20230425114307.36889-3-wander@redhat.com> In-Reply-To: <20230425114307.36889-1-wander@redhat.com> References: <20230425114307.36889-1-wander@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 X-Spam-Status: No, score=-2.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1764148607038797255?= X-GMAIL-MSGID: =?utf-8?q?1764148607038797255?= |
Series |
Introduce put_task_struct_atomic_sleep()
|
|
Commit Message
Wander Lairson Costa
April 25, 2023, 11:43 a.m. UTC
Due to the possibility of indirectly acquiring sleeping locks, it is unsafe to call put_task_struct() in atomic contexts when the kernel is compiled with PREEMPT_RT. To mitigate this issue, this commit introduces put_task_struct_atomic_safe(), which schedules __put_task_struct() through call_rcu() when PREEMPT_RT is enabled. While a workqueue would be a more natural approach, we cannot allocate dynamic memory from atomic context in PREEMPT_RT, making the code more complex. This implementation ensures safe execution in atomic contexts and avoids any potential issues that may arise from using the non-atomic version. Signed-off-by: Wander Lairson Costa <wander@redhat.com> Reported-by: Hu Chunyu <chuhu@redhat.com> Reviewed-by: Paul McKenney <paulmck@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> --- include/linux/sched/task.h | 35 +++++++++++++++++++++++++++++++++++ kernel/fork.c | 8 ++++++++ 2 files changed, 43 insertions(+)
Comments
On Tue, Apr 25, 2023 at 08:43:02AM -0300, Wander Lairson Costa wrote: > diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h > index b597b97b1f8f..cf774b83b2ec 100644 > --- a/include/linux/sched/task.h > +++ b/include/linux/sched/task.h > @@ -141,6 +141,41 @@ static inline void put_task_struct_many(struct task_struct *t, int nr) > > void put_task_struct_rcu_user(struct task_struct *task); > > +extern void __delayed_put_task_struct(struct rcu_head *rhp); > + > +static inline void put_task_struct_atomic_safe(struct task_struct *task) > +{ > + if (IS_ENABLED(CONFIG_PREEMPT_RT)) { > + /* > + * Decrement the refcount explicitly to avoid unnecessarily > + * calling call_rcu. > + */ > + if (refcount_dec_and_test(&task->usage)) > + /* > + * under PREEMPT_RT, we can't call put_task_struct > + * in atomic context because it will indirectly > + * acquire sleeping locks. > + * call_rcu() will schedule __delayed_put_task_struct() > + * to be called in process context. > + * > + * __put_task_struct() is called when > + * refcount_dec_and_test(&t->usage) succeeds. > + * > + * This means that it can't conflict with > + * put_task_struct_rcu_user() which abuses ->rcu the same > + * way; rcu_users has a reference so task->usage can't be > + * zero after rcu_users 1 -> 0 transition. > + * > + * delayed_free_task() also uses ->rcu, but it is only called > + * when it fails to fork a process. Therefore, there is no > + * way it can conflict with put_task_struct(). > + */ > + call_rcu(&task->rcu, __delayed_put_task_struct); > + } else { > + put_task_struct(task); > + } > +} Urgh.. that's plenty horrible. And I'm sure everybody plus kitchen sink has already asked why can't we just rcu free the thing unconditionally. Google only found me an earlier version of this same patch set, but I'm sure we've had that discussion many times over the past several years. The above and your follow up patch is just horrible. It requires users to know the RT specific context and gives them no help what so ever for !RT builds.
On 04/05/23 10:42, Peter Zijlstra wrote: > On Tue, Apr 25, 2023 at 08:43:02AM -0300, Wander Lairson Costa wrote: >> diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h >> index b597b97b1f8f..cf774b83b2ec 100644 >> --- a/include/linux/sched/task.h >> +++ b/include/linux/sched/task.h >> @@ -141,6 +141,41 @@ static inline void put_task_struct_many(struct task_struct *t, int nr) >> >> void put_task_struct_rcu_user(struct task_struct *task); >> >> +extern void __delayed_put_task_struct(struct rcu_head *rhp); >> + >> +static inline void put_task_struct_atomic_safe(struct task_struct *task) >> +{ >> + if (IS_ENABLED(CONFIG_PREEMPT_RT)) { >> + /* >> + * Decrement the refcount explicitly to avoid unnecessarily >> + * calling call_rcu. >> + */ >> + if (refcount_dec_and_test(&task->usage)) >> + /* >> + * under PREEMPT_RT, we can't call put_task_struct >> + * in atomic context because it will indirectly >> + * acquire sleeping locks. >> + * call_rcu() will schedule __delayed_put_task_struct() >> + * to be called in process context. >> + * >> + * __put_task_struct() is called when >> + * refcount_dec_and_test(&t->usage) succeeds. >> + * >> + * This means that it can't conflict with >> + * put_task_struct_rcu_user() which abuses ->rcu the same >> + * way; rcu_users has a reference so task->usage can't be >> + * zero after rcu_users 1 -> 0 transition. >> + * >> + * delayed_free_task() also uses ->rcu, but it is only called >> + * when it fails to fork a process. Therefore, there is no >> + * way it can conflict with put_task_struct(). >> + */ >> + call_rcu(&task->rcu, __delayed_put_task_struct); >> + } else { >> + put_task_struct(task); >> + } >> +} > > Urgh.. that's plenty horrible. And I'm sure everybody plus kitchen sink > has already asked why can't we just rcu free the thing unconditionally. > > Google only found me an earlier version of this same patch set, but I'm > sure we've had that discussion many times over the past several years. > The above and your follow up patch is just horrible. > So on v3/v4 we got to doing that unconditionally for PREEMPT_RT, but per [1] Wander went back to hand-fixing the problematic callsites. Now that I'm looking at it again, I couldn't find a concrete argument from Oleg against doing this unconditionally - as Wander is pointing out in the changelog and comments, reusing task_struct.rcu for that purpose is safe (although not necessarily obviously so). Is this just miscommunication, or is there a genuine issue with doing this unconditionally? As argued before, I'd also much rather have this be an unconditional call_rcu() (regardless of context or PREEMPT_RT).
On Thu, May 04, 2023 at 10:42:29AM +0200, Peter Zijlstra wrote: > On Tue, Apr 25, 2023 at 08:43:02AM -0300, Wander Lairson Costa wrote: > > diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h > > index b597b97b1f8f..cf774b83b2ec 100644 > > --- a/include/linux/sched/task.h > > +++ b/include/linux/sched/task.h > > @@ -141,6 +141,41 @@ static inline void put_task_struct_many(struct task_struct *t, int nr) > > > > void put_task_struct_rcu_user(struct task_struct *task); > > > > +extern void __delayed_put_task_struct(struct rcu_head *rhp); > > + > > +static inline void put_task_struct_atomic_safe(struct task_struct *task) > > +{ > > + if (IS_ENABLED(CONFIG_PREEMPT_RT)) { > > + /* > > + * Decrement the refcount explicitly to avoid unnecessarily > > + * calling call_rcu. > > + */ > > + if (refcount_dec_and_test(&task->usage)) > > + /* > > + * under PREEMPT_RT, we can't call put_task_struct > > + * in atomic context because it will indirectly > > + * acquire sleeping locks. > > + * call_rcu() will schedule __delayed_put_task_struct() > > + * to be called in process context. > > + * > > + * __put_task_struct() is called when > > + * refcount_dec_and_test(&t->usage) succeeds. > > + * > > + * This means that it can't conflict with > > + * put_task_struct_rcu_user() which abuses ->rcu the same > > + * way; rcu_users has a reference so task->usage can't be > > + * zero after rcu_users 1 -> 0 transition. > > + * > > + * delayed_free_task() also uses ->rcu, but it is only called > > + * when it fails to fork a process. Therefore, there is no > > + * way it can conflict with put_task_struct(). > > + */ > > + call_rcu(&task->rcu, __delayed_put_task_struct); > > + } else { > > + put_task_struct(task); > > + } > > +} > > Urgh.. that's plenty horrible. And I'm sure everybody plus kitchen sink > has already asked why can't we just rcu free the thing unconditionally. > > Google only found me an earlier version of this same patch set, but I'm > sure we've had that discussion many times over the past several years. > The above and your follow up patch is just horrible. > > It requires users to know the RT specific context and gives them no help > what so ever for !RT builds. > > No problem, I will send a new version doing it unconditionally.
On Thu, May 04, 2023 at 10:32:31AM +0100, Valentin Schneider wrote: > On 04/05/23 10:42, Peter Zijlstra wrote: > > On Tue, Apr 25, 2023 at 08:43:02AM -0300, Wander Lairson Costa wrote: > >> diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h > >> index b597b97b1f8f..cf774b83b2ec 100644 > >> --- a/include/linux/sched/task.h > >> +++ b/include/linux/sched/task.h > >> @@ -141,6 +141,41 @@ static inline void put_task_struct_many(struct task_struct *t, int nr) > >> > >> void put_task_struct_rcu_user(struct task_struct *task); > >> > >> +extern void __delayed_put_task_struct(struct rcu_head *rhp); > >> + > >> +static inline void put_task_struct_atomic_safe(struct task_struct *task) > >> +{ > >> + if (IS_ENABLED(CONFIG_PREEMPT_RT)) { > >> + /* > >> + * Decrement the refcount explicitly to avoid unnecessarily > >> + * calling call_rcu. > >> + */ > >> + if (refcount_dec_and_test(&task->usage)) > >> + /* > >> + * under PREEMPT_RT, we can't call put_task_struct > >> + * in atomic context because it will indirectly > >> + * acquire sleeping locks. > >> + * call_rcu() will schedule __delayed_put_task_struct() > >> + * to be called in process context. > >> + * > >> + * __put_task_struct() is called when > >> + * refcount_dec_and_test(&t->usage) succeeds. > >> + * > >> + * This means that it can't conflict with > >> + * put_task_struct_rcu_user() which abuses ->rcu the same > >> + * way; rcu_users has a reference so task->usage can't be > >> + * zero after rcu_users 1 -> 0 transition. > >> + * > >> + * delayed_free_task() also uses ->rcu, but it is only called > >> + * when it fails to fork a process. Therefore, there is no > >> + * way it can conflict with put_task_struct(). > >> + */ > >> + call_rcu(&task->rcu, __delayed_put_task_struct); > >> + } else { > >> + put_task_struct(task); > >> + } > >> +} > > > > Urgh.. that's plenty horrible. And I'm sure everybody plus kitchen sink > > has already asked why can't we just rcu free the thing unconditionally. > > > > Google only found me an earlier version of this same patch set, but I'm > > sure we've had that discussion many times over the past several years. > > The above and your follow up patch is just horrible. > > > > So on v3/v4 we got to doing that unconditionally for PREEMPT_RT, but per > [1] Wander went back to hand-fixing the problematic callsites. > > Now that I'm looking at it again, I couldn't find a concrete argument from > Oleg against doing this unconditionally - as Wander is pointing out in the > changelog and comments, reusing task_struct.rcu for that purpose is safe > (although not necessarily obviously so). > > Is this just miscommunication, or is there a genuine issue with doing this > unconditionally? As argued before, I'd also much rather have this be an > unconditional call_rcu() (regardless of context or PREEMPT_RT). > Yeah, I think it was a misunderstanding of mine.
On 05/04, Peter Zijlstra wrote: > > Urgh.. that's plenty horrible. And I'm sure everybody plus kitchen sink > has already asked why can't we just rcu free the thing unconditionally. > > Google only found me an earlier version of this same patch set, but I'm > sure we've had that discussion many times over the past several years. Yes... see for example https://lore.kernel.org/lkml/CAHk-=whtj+aSYftniMRG4xvFE8dmmYyrqcJyPmzStsfj5w9r=w@mail.gmail.com/ We already have an rcu pass before put_task_struct(zombie), see put_task_struct_rcu_user(), another one look unfortunate. Oleg.
On Thu, May 04, 2023 at 02:29:45PM +0200, Oleg Nesterov wrote: > On 05/04, Peter Zijlstra wrote: > > > > Urgh.. that's plenty horrible. And I'm sure everybody plus kitchen sink > > has already asked why can't we just rcu free the thing unconditionally. > > > > Google only found me an earlier version of this same patch set, but I'm > > sure we've had that discussion many times over the past several years. > > Yes... see for example > > https://lore.kernel.org/lkml/CAHk-=whtj+aSYftniMRG4xvFE8dmmYyrqcJyPmzStsfj5w9r=w@mail.gmail.com/ > > We already have an rcu pass before put_task_struct(zombie), see > put_task_struct_rcu_user(), another one look unfortunate. Ah indeed, it got mentioned there as well. And Linus seems to be arguing against doing an rcu free there. So humm.. Then I'm thinking something trivial like so: static inline void put_task_struct(struct task_struct *t) { if (!refcount_dec_and_test(&t->usage)) return; if (IS_ENABLED(CONFIG_PREEMPT_RT) && !preemptible()) call_rcu(&t->rcu, __put_task_struct_rcu); __put_task_struct(t); } should do, or alternatively use irq_work, which has a much lower latency, but meh..
On Thu, May 4, 2023 at 11:34 AM Peter Zijlstra <peterz@infradead.org> wrote: > > On Thu, May 04, 2023 at 02:29:45PM +0200, Oleg Nesterov wrote: > > On 05/04, Peter Zijlstra wrote: > > > > > > Urgh.. that's plenty horrible. And I'm sure everybody plus kitchen sink > > > has already asked why can't we just rcu free the thing unconditionally. > > > > > > Google only found me an earlier version of this same patch set, but I'm > > > sure we've had that discussion many times over the past several years. > > > > Yes... see for example > > > > https://lore.kernel.org/lkml/CAHk-=whtj+aSYftniMRG4xvFE8dmmYyrqcJyPmzStsfj5w9r=w@mail.gmail.com/ > > > > We already have an rcu pass before put_task_struct(zombie), see > > put_task_struct_rcu_user(), another one look unfortunate. > > Ah indeed, it got mentioned there as well. And Linus seems to be arguing > against doing an rcu free there. So humm.. > > Then I'm thinking something trivial like so: > > static inline void put_task_struct(struct task_struct *t) > { > if (!refcount_dec_and_test(&t->usage)) > return; > > if (IS_ENABLED(CONFIG_PREEMPT_RT) && !preemptible()) > call_rcu(&t->rcu, __put_task_struct_rcu); > > __put_task_struct(t); > } > That's what v5 [1] does. What would be the path in this case? Should I resend it as v8? > should do, or alternatively use irq_work, which has a much lower > latency, but meh.. > Initially, I did that. I switched to call_rcu because the code got much simpler. [1] https://lore.kernel.org/all/20230210161323.37400-1-wander@redhat.com/
On 05/04, Wander Lairson Costa wrote: > > On Thu, May 4, 2023 at 11:34 AM Peter Zijlstra <peterz@infradead.org> wrote: > > > > static inline void put_task_struct(struct task_struct *t) > > { > > if (!refcount_dec_and_test(&t->usage)) > > return; > > > > if (IS_ENABLED(CONFIG_PREEMPT_RT) && !preemptible()) > > call_rcu(&t->rcu, __put_task_struct_rcu); > > > > __put_task_struct(t); > > } > > > > That's what v5 [1] does. Yes, but as Sebastian explained CONFIG_PROVE_RAW_LOCK_NESTING won't like it. https://lore.kernel.org/all/Y+zFNrCjBn53%2F+Q2@linutronix.de/ Oleg.
On Thu, May 04, 2023 at 11:55:15AM -0300, Wander Lairson Costa wrote: > > Then I'm thinking something trivial like so: > > > > static inline void put_task_struct(struct task_struct *t) > > { > > if (!refcount_dec_and_test(&t->usage)) > > return; > > > > if (IS_ENABLED(CONFIG_PREEMPT_RT) && !preemptible()) > > call_rcu(&t->rcu, __put_task_struct_rcu); > > > > __put_task_struct(t); > > } > > > > That's what v5 [1] does. What would be the path in this case? Should I > resend it as v8? It's almost what v5 does. v5 also has a !in_task() thing. v5 also violates codingstyle :-) But yeah.. something like that.
On Thu, May 04, 2023 at 05:23:07PM +0200, Oleg Nesterov wrote: > On 05/04, Wander Lairson Costa wrote: > > > > On Thu, May 4, 2023 at 11:34 AM Peter Zijlstra <peterz@infradead.org> wrote: > > > > > > static inline void put_task_struct(struct task_struct *t) > > > { > > > if (!refcount_dec_and_test(&t->usage)) > > > return; > > > > > > if (IS_ENABLED(CONFIG_PREEMPT_RT) && !preemptible()) > > > call_rcu(&t->rcu, __put_task_struct_rcu); > > > > > > __put_task_struct(t); > > > } > > > > > > > That's what v5 [1] does. > > Yes, but as Sebastian explained CONFIG_PROVE_RAW_LOCK_NESTING won't like it. This can help: https://lkml.kernel.org/r/168303194177.404.8610123576035502891.tip-bot2@tip-bot2
On Thu, May 04, 2023 at 05:24:24PM +0200, Peter Zijlstra wrote: > On Thu, May 04, 2023 at 11:55:15AM -0300, Wander Lairson Costa wrote: > > > > Then I'm thinking something trivial like so: > > > > > > static inline void put_task_struct(struct task_struct *t) > > > { > > > if (!refcount_dec_and_test(&t->usage)) > > > return; > > > > > > if (IS_ENABLED(CONFIG_PREEMPT_RT) && !preemptible()) > > > call_rcu(&t->rcu, __put_task_struct_rcu); > > > > > > __put_task_struct(t); > > > } > > > > > > > That's what v5 [1] does. What would be the path in this case? Should I > > resend it as v8? > > It's almost what v5 does. v5 also has a !in_task() thing. v5 also > violates codingstyle :-) IIRC, the in_task() is there because preemptible() doesn't check if it is running in interrupt context. > > But yeah.. something like that. >
On Thu, May 4, 2023 at 12:23 PM Oleg Nesterov <oleg@redhat.com> wrote: > > On 05/04, Wander Lairson Costa wrote: > > > > On Thu, May 4, 2023 at 11:34 AM Peter Zijlstra <peterz@infradead.org> wrote: > > > > > > static inline void put_task_struct(struct task_struct *t) > > > { > > > if (!refcount_dec_and_test(&t->usage)) > > > return; > > > > > > if (IS_ENABLED(CONFIG_PREEMPT_RT) && !preemptible()) > > > call_rcu(&t->rcu, __put_task_struct_rcu); > > > > > > __put_task_struct(t); > > > } > > > > > > > That's what v5 [1] does. > > Yes, but as Sebastian explained CONFIG_PROVE_RAW_LOCK_NESTING won't like it. > > https://lore.kernel.org/all/Y+zFNrCjBn53%2F+Q2@linutronix.de/ > I think that was my confusion in that thread. My understanding is that CONFIG_PROVE_RAW_LOCK_NESTING will check lock ordering but not context. I feel my ignorance is blocking me from seeing something that is obvious to everyone else :/ > Oleg. >
On 05/04, Wander Lairson Costa wrote: > > On Thu, May 4, 2023 at 12:23 PM Oleg Nesterov <oleg@redhat.com> wrote: > > > > Yes, but as Sebastian explained CONFIG_PROVE_RAW_LOCK_NESTING won't like it. > > > > https://lore.kernel.org/all/Y+zFNrCjBn53%2F+Q2@linutronix.de/ > > > > I think that was my confusion in that thread. My understanding is that > CONFIG_PROVE_RAW_LOCK_NESTING will check lock ordering but not > context. Sorry, I don't understand... perhaps I missed something. But iiuc the problem is simple. So, this code raw_spin_lock(one); spin_lock(two); is obviously wrong if CONFIG_PREEMPT_RT. Without PREEMPT_RT this code is fine because raw_spinlock_t and spinlock_t are the same thing. Except they have different lockdep annotations if CONFIG_PROVE_RAW_LOCK_NESTING is true, LD_WAIT_SPIN and LD_WAIT_CONFIG. So if CONFIG_PROVE_RAW_LOCK_NESTING is set, lockdep will complain even on the !PREEMPT_RT kernel, iow it checks the nesting as if the code runs on with PREEMPT_RT. Cough... not sure my explanation can help ;) It looks very confusing when I read it. Oleg.
On Thu, May 4, 2023 at 4:23 PM Oleg Nesterov <oleg@redhat.com> wrote: > > On 05/04, Wander Lairson Costa wrote: > > > > On Thu, May 4, 2023 at 12:23 PM Oleg Nesterov <oleg@redhat.com> wrote: > > > > > > Yes, but as Sebastian explained CONFIG_PROVE_RAW_LOCK_NESTING won't like it. > > > > > > https://lore.kernel.org/all/Y+zFNrCjBn53%2F+Q2@linutronix.de/ > > > > > > > I think that was my confusion in that thread. My understanding is that > > CONFIG_PROVE_RAW_LOCK_NESTING will check lock ordering but not > > context. > > Sorry, I don't understand... perhaps I missed something. But iiuc > the problem is simple. > > So, this code > > raw_spin_lock(one); > spin_lock(two); > > is obviously wrong if CONFIG_PREEMPT_RT. > > Without PREEMPT_RT this code is fine because raw_spinlock_t and spinlock_t > are the same thing. Except they have different lockdep annotations if > CONFIG_PROVE_RAW_LOCK_NESTING is true, LD_WAIT_SPIN and LD_WAIT_CONFIG. > > So if CONFIG_PROVE_RAW_LOCK_NESTING is set, lockdep will complain even > on the !PREEMPT_RT kernel, iow it checks the nesting as if the code runs > on with PREEMPT_RT. > > Cough... not sure my explanation can help ;) It looks very confusing when > I read it. > Thanks for the explanation. That's my understanding too. The part I don't get is why this would fail with a call_rcu() inside put_task_struct(). > Oleg. >
Hi Wander, I certainly missed something ;) plus I am already sleeping. but let me try to reply anyway. On 05/04, Wander Lairson Costa wrote: > On Thu, May 4, 2023 at 4:23 PM Oleg Nesterov <oleg@redhat.com> wrote: > > > > On 05/04, Wander Lairson Costa wrote: > > > > > > On Thu, May 4, 2023 at 12:23 PM Oleg Nesterov <oleg@redhat.com> wrote: > > > > > > > > Yes, but as Sebastian explained CONFIG_PROVE_RAW_LOCK_NESTING won't like it. > > > > > > > > https://lore.kernel.org/all/Y+zFNrCjBn53%2F+Q2@linutronix.de/ > > > > > > > > > > I think that was my confusion in that thread. My understanding is that > > > CONFIG_PROVE_RAW_LOCK_NESTING will check lock ordering but not > > > context. > > > > Sorry, I don't understand... perhaps I missed something. But iiuc > > the problem is simple. > > > > So, this code > > > > raw_spin_lock(one); > > spin_lock(two); > > > > is obviously wrong if CONFIG_PREEMPT_RT. > > > > Without PREEMPT_RT this code is fine because raw_spinlock_t and spinlock_t > > are the same thing. Except they have different lockdep annotations if > > CONFIG_PROVE_RAW_LOCK_NESTING is true, LD_WAIT_SPIN and LD_WAIT_CONFIG. > > > > So if CONFIG_PROVE_RAW_LOCK_NESTING is set, lockdep will complain even > > on the !PREEMPT_RT kernel, iow it checks the nesting as if the code runs > > on with PREEMPT_RT. > > > > Cough... not sure my explanation can help ;) It looks very confusing when > > I read it. > > > > Thanks for the explanation. That's my understanding too. The part I > don't get is why this would fail with a call_rcu() inside > put_task_struct(). the problem is that call_rcu() won't be called if !IS_ENABLED(PREEMPT_RT), ___put_task_struct() will be called. CONFIG_PROVE_RAW_LOCK_NESTING can't know this can't happen if PREEMPT_RT is set. IOW. To simplify, suppose we have // can be called in atomic context, e.g. under // raw_spin_lock() so it is wrong with PREEMPT_RT void __put_task_struct(struct task_struct *tsk) { spin_lock(some_lock); } lets "fix" the code above, lets change __put_task_struct, void __put_task_struct(struct task_struct *tsk) { if (!IS_ENABLED(CONFIG_PREEMPT_RT)) return; spin_lock(some_lock); } Now, if CONFIG_PREEMPT_RT is true then __put_task_struct() is fine wrt lock nesting. But, if CONFIG_PREEMPT_RT is not set, then __put_task_struct() still does the same: void __put_task_struct(struct task_struct *tsk) { spin_lock(some_lock); } and CONFIG_PROVE_RAW_LOCK_NESTING will complain. Because, once again, it checks the nesting as if CONFIG_PREEMPT_RT is true, and in this case __put_task_struct() if it is called under raw_spin_lock(). Oleg.
On Thu, May 04, 2023 at 03:21:11PM -0300, Wander Lairson Costa wrote: > On Thu, May 04, 2023 at 05:24:24PM +0200, Peter Zijlstra wrote: > > On Thu, May 04, 2023 at 11:55:15AM -0300, Wander Lairson Costa wrote: > > > > > > Then I'm thinking something trivial like so: > > > > > > > > static inline void put_task_struct(struct task_struct *t) > > > > { > > > > if (!refcount_dec_and_test(&t->usage)) > > > > return; > > > > > > > > if (IS_ENABLED(CONFIG_PREEMPT_RT) && !preemptible()) > > > > call_rcu(&t->rcu, __put_task_struct_rcu); > > > > > > > > __put_task_struct(t); > > > > } > > > > > > > > > > That's what v5 [1] does. What would be the path in this case? Should I > > > resend it as v8? > > > > It's almost what v5 does. v5 also has a !in_task() thing. v5 also > > violates codingstyle :-) > > IIRC, the in_task() is there because preemptible() doesn't check if it > is running in interrupt context. #define preemptible() (preempt_count() == 0 && !irqs_disabled()) When in interrupt context preempt_count() will have a non-zero value in HARDIRQ_MASK and IRQs must be disabled, so preemptible() evaluates to (false && false), last time I checked that ends up being false.
On Thu, May 04, 2023 at 05:30:57PM +0200, Peter Zijlstra wrote: > On Thu, May 04, 2023 at 05:23:07PM +0200, Oleg Nesterov wrote: > > Yes, but as Sebastian explained CONFIG_PROVE_RAW_LOCK_NESTING won't like it. > > This can help: > > https://lkml.kernel.org/r/168303194177.404.8610123576035502891.tip-bot2@tip-bot2 Explicitly: static inline void put_task_struct(struct task_struct *t) { if (!refcount_dec_and_test(&t->usage)) return; if (!IS_ENABLED(CONFIG_PREEMPT_RT) || premptible()) { /* * ... same comment as the other patch ... */ static DEFINE_WAIT_OVERRIDE_MAP(put_task_map, LD_WAIT_SLEEP); lock_map_acquire_try(&put_task_map); __put_task_struct(t); lock_map_release(&put_task_map); return; } call_rcu(&t->rcu, __put_task_struct_rcu); } Should not complain since we tell it to STFU :-)
On Fri, 5 May 2023 15:32:35 +0200 Peter Zijlstra <peterz@infradead.org> wrote: > On Thu, May 04, 2023 at 03:21:11PM -0300, Wander Lairson Costa wrote: > > On Thu, May 04, 2023 at 05:24:24PM +0200, Peter Zijlstra wrote: > > > On Thu, May 04, 2023 at 11:55:15AM -0300, Wander Lairson Costa wrote: > > > > > > > > Then I'm thinking something trivial like so: > > > > > > > > > > static inline void put_task_struct(struct task_struct *t) > > > > > { > > > > > if (!refcount_dec_and_test(&t->usage)) > > > > > return; > > > > > > > > > > if (IS_ENABLED(CONFIG_PREEMPT_RT) && !preemptible()) > > > > > call_rcu(&t->rcu, __put_task_struct_rcu); > > > > > > > > > > __put_task_struct(t); > > > > > } > > > > > > > > > > > > > That's what v5 [1] does. What would be the path in this case? Should I > > > > resend it as v8? > > > > > > It's almost what v5 does. v5 also has a !in_task() thing. v5 also > > > violates codingstyle :-) > > > > IIRC, the in_task() is there because preemptible() doesn't check if it > > is running in interrupt context. > > #define preemptible() (preempt_count() == 0 && !irqs_disabled()) > > When in interrupt context preempt_count() will have a non-zero value in > HARDIRQ_MASK and IRQs must be disabled, so preemptible() evaluates to > (false && false), last time I checked that ends up being false. Interesting, I can't find v5 anywhere in my mail folders (but I have v4 and v6!). Anyway, from just the context of this email, and seeing IS_ENABLED(CONFIG_PREEMPT_RT), I'm guessing that in_task() returns false if it's running in a interrupt thread, where preemtible() does not. -- Steve
On Fri, 5 May 2023 10:26:02 -0400 Steven Rostedt <rostedt@goodmis.org> wrote: > > > IIRC, the in_task() is there because preemptible() doesn't check if it > > > is running in interrupt context. > > > > #define preemptible() (preempt_count() == 0 && !irqs_disabled()) > > > > When in interrupt context preempt_count() will have a non-zero value in > > HARDIRQ_MASK and IRQs must be disabled, so preemptible() evaluates to > > (false && false), last time I checked that ends up being false. > > Interesting, I can't find v5 anywhere in my mail folders (but I have > v4 and v6!). Anyway, from just the context of this email, and seeing > IS_ENABLED(CONFIG_PREEMPT_RT), I'm guessing that in_task() returns false if > it's running in a interrupt thread, where preemtible() does not. But then I question, does it matter if it is running in an interrupt thread or not for put_task_struct()? -- Steve
On Fri, May 5, 2023 at 10:33 AM Peter Zijlstra <peterz@infradead.org> wrote: > > On Thu, May 04, 2023 at 03:21:11PM -0300, Wander Lairson Costa wrote: > > On Thu, May 04, 2023 at 05:24:24PM +0200, Peter Zijlstra wrote: > > > On Thu, May 04, 2023 at 11:55:15AM -0300, Wander Lairson Costa wrote: > > > > > > > > Then I'm thinking something trivial like so: > > > > > > > > > > static inline void put_task_struct(struct task_struct *t) > > > > > { > > > > > if (!refcount_dec_and_test(&t->usage)) > > > > > return; > > > > > > > > > > if (IS_ENABLED(CONFIG_PREEMPT_RT) && !preemptible()) > > > > > call_rcu(&t->rcu, __put_task_struct_rcu); > > > > > > > > > > __put_task_struct(t); > > > > > } > > > > > > > > > > > > > That's what v5 [1] does. What would be the path in this case? Should I > > > > resend it as v8? > > > > > > It's almost what v5 does. v5 also has a !in_task() thing. v5 also > > > violates codingstyle :-) > > > > IIRC, the in_task() is there because preemptible() doesn't check if it > > is running in interrupt context. > > #define preemptible() (preempt_count() == 0 && !irqs_disabled()) > > When in interrupt context preempt_count() will have a non-zero value in > HARDIRQ_MASK and IRQs must be disabled, so preemptible() evaluates to > (false && false), last time I checked that ends up being false. > When I first tested, I started with in_irq(), then I saw some warnings and added preemptible(). Never tested with preemptible() alone. Thanks, I will change that and test it.
On Thu, May 4, 2023 at 5:16 PM Oleg Nesterov <oleg@redhat.com> wrote: > > Hi Wander, > > I certainly missed something ;) plus I am already sleeping. but let me try to > reply anyway. > > On 05/04, Wander Lairson Costa wrote: > > On Thu, May 4, 2023 at 4:23 PM Oleg Nesterov <oleg@redhat.com> wrote: > > > > > > On 05/04, Wander Lairson Costa wrote: > > > > > > > > On Thu, May 4, 2023 at 12:23 PM Oleg Nesterov <oleg@redhat.com> wrote: > > > > > > > > > > Yes, but as Sebastian explained CONFIG_PROVE_RAW_LOCK_NESTING won't like it. > > > > > > > > > > https://lore.kernel.org/all/Y+zFNrCjBn53%2F+Q2@linutronix.de/ > > > > > > > > > > > > > I think that was my confusion in that thread. My understanding is that > > > > CONFIG_PROVE_RAW_LOCK_NESTING will check lock ordering but not > > > > context. > > > > > > Sorry, I don't understand... perhaps I missed something. But iiuc > > > the problem is simple. > > > > > > So, this code > > > > > > raw_spin_lock(one); > > > spin_lock(two); > > > > > > is obviously wrong if CONFIG_PREEMPT_RT. > > > > > > Without PREEMPT_RT this code is fine because raw_spinlock_t and spinlock_t > > > are the same thing. Except they have different lockdep annotations if > > > CONFIG_PROVE_RAW_LOCK_NESTING is true, LD_WAIT_SPIN and LD_WAIT_CONFIG. > > > > > > So if CONFIG_PROVE_RAW_LOCK_NESTING is set, lockdep will complain even > > > on the !PREEMPT_RT kernel, iow it checks the nesting as if the code runs > > > on with PREEMPT_RT. > > > > > > Cough... not sure my explanation can help ;) It looks very confusing when > > > I read it. > > > > > > > Thanks for the explanation. That's my understanding too. The part I > > don't get is why this would fail with a call_rcu() inside > > put_task_struct(). > > the problem is that call_rcu() won't be called if !IS_ENABLED(PREEMPT_RT), > ___put_task_struct() will be called. > > CONFIG_PROVE_RAW_LOCK_NESTING can't know this can't happen if PREEMPT_RT > is set. > > IOW. To simplify, suppose we have > > // can be called in atomic context, e.g. under > // raw_spin_lock() so it is wrong with PREEMPT_RT > void __put_task_struct(struct task_struct *tsk) > { > spin_lock(some_lock); > } > > lets "fix" the code above, lets change __put_task_struct, > > void __put_task_struct(struct task_struct *tsk) > { > if (!IS_ENABLED(CONFIG_PREEMPT_RT)) > return; > > spin_lock(some_lock); > } > > Now, if CONFIG_PREEMPT_RT is true then __put_task_struct() is fine > wrt lock nesting. > > But, if CONFIG_PREEMPT_RT is not set, then __put_task_struct() still > does the same: > > void __put_task_struct(struct task_struct *tsk) > { > spin_lock(some_lock); > } > > and CONFIG_PROVE_RAW_LOCK_NESTING will complain. Because, once again, > it checks the nesting as if CONFIG_PREEMPT_RT is true, and in this case > __put_task_struct() if it is called under raw_spin_lock(). > IIUC, this is a problem with the current implementation, isn't it? Because the holder of raw_spin_lock is the put_task_struct() caller. > Oleg. >
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h index b597b97b1f8f..cf774b83b2ec 100644 --- a/include/linux/sched/task.h +++ b/include/linux/sched/task.h @@ -141,6 +141,41 @@ static inline void put_task_struct_many(struct task_struct *t, int nr) void put_task_struct_rcu_user(struct task_struct *task); +extern void __delayed_put_task_struct(struct rcu_head *rhp); + +static inline void put_task_struct_atomic_safe(struct task_struct *task) +{ + if (IS_ENABLED(CONFIG_PREEMPT_RT)) { + /* + * Decrement the refcount explicitly to avoid unnecessarily + * calling call_rcu. + */ + if (refcount_dec_and_test(&task->usage)) + /* + * under PREEMPT_RT, we can't call put_task_struct + * in atomic context because it will indirectly + * acquire sleeping locks. + * call_rcu() will schedule __delayed_put_task_struct() + * to be called in process context. + * + * __put_task_struct() is called when + * refcount_dec_and_test(&t->usage) succeeds. + * + * This means that it can't conflict with + * put_task_struct_rcu_user() which abuses ->rcu the same + * way; rcu_users has a reference so task->usage can't be + * zero after rcu_users 1 -> 0 transition. + * + * delayed_free_task() also uses ->rcu, but it is only called + * when it fails to fork a process. Therefore, there is no + * way it can conflict with put_task_struct(). + */ + call_rcu(&task->rcu, __delayed_put_task_struct); + } else { + put_task_struct(task); + } +} + /* Free all architecture-specific resources held by a thread. */ void release_thread(struct task_struct *dead_task); diff --git a/kernel/fork.c b/kernel/fork.c index ea332319dffe..7f016b691b1d 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -854,6 +854,14 @@ void __put_task_struct(struct task_struct *tsk) } EXPORT_SYMBOL_GPL(__put_task_struct); +void __delayed_put_task_struct(struct rcu_head *rhp) +{ + struct task_struct *task = container_of(rhp, struct task_struct, rcu); + + __put_task_struct(task); +} +EXPORT_SYMBOL_GPL(__delayed_put_task_struct); + void __init __weak arch_task_cache_init(void) { } /*