Message ID | 20221218191310.130904-3-joel@joelfernandes.org |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:e747:0:0:0:0:0 with SMTP id c7csp2042005wrn; Sun, 18 Dec 2022 11:30:56 -0800 (PST) X-Google-Smtp-Source: AA0mqf4GrNke/cGDDPvYd+PDRgkr6U129+gL6mPxUE349KIFrmz1s4HBYjOQUWQ+b/XsRUqHgh6v X-Received: by 2002:a17:902:eb52:b0:189:c253:3625 with SMTP id i18-20020a170902eb5200b00189c2533625mr40852671pli.11.1671391855832; Sun, 18 Dec 2022 11:30:55 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1671391855; cv=none; d=google.com; s=arc-20160816; b=PtOxRSVOajGx24fMk05P4K/BhNN5U6w34YeP5LBn2zDva/IUAn6/PYbh59bQQBKhm0 4LfF1wdLZu2xrN27k12DpyjHMmIU4QU1Vc8dgSylyNLCiJPlE88JsAI96CrB3EQB/p7V TcNkG5uPHdXOhntoxvD3E2hQzMxCH3JMnY6O1nSLv90bdSSWmv3s3Dj9oSR1dX4QXRiU 0FDQHwSccXutDKvUTZHRi7oppnYMZ1s39L6Ye7FAsPFSvJ6BkQmdQt/ur4qn8hnqVpIF Gp4PhjyLpWwdVClTzDkESY4F8mSzGvXf+yqKZ2dQeiTlmLdfWpwdvxJuBB2evqP4c10r YJ+g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=vj6H6w9xKIP6KzPO24qnEpNiKICnr312xMbSiquwZ1w=; b=nj9x2PO25gPKHfzap8K/1+CV8yqUk3GXTaNwQzbh1zRD6vKNkhTaYz147VcWRasduA i6bDOmnKqw5tVnSLQ1SplfGcYdEF4KdtpmvJFqhqmBh0gIblGvEBztXzwftsJSQz2pOk NBxaB6gmfZF24Mg2bcWPYFDFwElNiJ7bIxmBlVnvzmE2a3/iMtE78ttZTzQeKkoeG/jm 9IASD93qI5mvtYm6ITXd4tifxZLrhOPRUJRfJNKnWeWoH/bQ8+XoeVALpLw4uMuvxPXT xg0o0pICgKsVIx91LayuHWf6TsXV4bzQQ9uBRKIz5A6o1dkba5au7nALYO1CtQUqImIF hiSw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=pt4vXBs6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g2-20020a636b02000000b004791c673bcesi8549764pgc.682.2022.12.18.11.30.42; Sun, 18 Dec 2022 11:30:55 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=pt4vXBs6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230492AbiLRTN3 (ORCPT <rfc822;peekingduck44@gmail.com> + 99 others); Sun, 18 Dec 2022 14:13:29 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44288 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230427AbiLRTNW (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Sun, 18 Dec 2022 14:13:22 -0500 Received: from mail-qt1-x836.google.com (mail-qt1-x836.google.com [IPv6:2607:f8b0:4864:20::836]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F38BFB870 for <linux-kernel@vger.kernel.org>; Sun, 18 Dec 2022 11:13:20 -0800 (PST) Received: by mail-qt1-x836.google.com with SMTP id cg5so6759351qtb.12 for <linux-kernel@vger.kernel.org>; Sun, 18 Dec 2022 11:13:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=vj6H6w9xKIP6KzPO24qnEpNiKICnr312xMbSiquwZ1w=; b=pt4vXBs6yibcCzpDPgkrFJCClB6mn1GvJAD4rIJtqm5ruo1W3/yInjv/vqqLoHGWV7 AF+fvogP7zG+31uqmGJp3tTlhfgpy9pDMUfo5H4aip1QWDMpt9fDgZNiuTaRExct1+WF hxgnAhD75gKG4kh/Il0+8ELUmXdFXEhTwAHq0= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=vj6H6w9xKIP6KzPO24qnEpNiKICnr312xMbSiquwZ1w=; b=RXeCgp4dtK6gl/Eboc+NgOgVAMHrQqTnKwMbPeCMYMNYLBO1EvBdGVmPyPHCoegpyJ Gjh06HfqBPcMnTcvfbV5A9Qsyu5OsyNDuLt2LIk9qFt9AnhryHn0FOyAbKsY/HTPhTGl 11YR4DeUo5X2QOX2L0MLWCK9kfH2dGHNxIFaGme/2mLVHaD8V09FrzdSHJF2Hsf51xZ9 3HK16Zy+6qR/+/aZcRWJ56+zHs/Eab+Ow8dZE5HEgiVD4DGmGRl3+8SzaRDyuyLmdyse YKmxPwpWQZqssWTtpOiboHd5CwrpquwL9ItX5JEb22+L//WGi5nJ+695oDQpMvv9PjRt rflg== X-Gm-Message-State: ANoB5plcOCEKgczMOR5ij9/7LJ+hByWqHOgUlOLoRRC/smWl6VR1KIYl mWHlMoxXjRO1tyTAAebRo+UWVvVQ6AXsYveKkfU= X-Received: by 2002:ac8:5292:0:b0:3a7:f183:7f66 with SMTP id s18-20020ac85292000000b003a7f1837f66mr54074448qtn.22.1671390799594; Sun, 18 Dec 2022 11:13:19 -0800 (PST) Received: from joelboxx.c.googlers.com.com (48.230.85.34.bc.googleusercontent.com. [34.85.230.48]) by smtp.gmail.com with ESMTPSA id cq8-20020a05622a424800b003a591194221sm4952864qtb.7.2022.12.18.11.13.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 18 Dec 2022 11:13:18 -0800 (PST) From: "Joel Fernandes (Google)" <joel@joelfernandes.org> To: linux-kernel@vger.kernel.org Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>, Josh Triplett <josh@joshtriplett.org>, Lai Jiangshan <jiangshanlai@gmail.com>, Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, "Paul E. McKenney" <paulmck@kernel.org>, rcu@vger.kernel.org, Steven Rostedt <rostedt@goodmis.org> Subject: [RFC 2/2] srcu: Remove memory barrier "E" as it is not required Date: Sun, 18 Dec 2022 19:13:09 +0000 Message-Id: <20221218191310.130904-3-joel@joelfernandes.org> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog In-Reply-To: <20221218191310.130904-1-joel@joelfernandes.org> References: <20221218191310.130904-1-joel@joelfernandes.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1752581386699439767?= X-GMAIL-MSGID: =?utf-8?q?1752581386699439767?= |
Series |
srcu: Remove pre-flip memory barrier
|
|
Commit Message
Joel Fernandes
Dec. 18, 2022, 7:13 p.m. UTC
During a flip, we have a full memory barrier before idx is incremented.
The effect of this seems to be to guarantee that, if a READER sees srcu_idx
updates (srcu_flip), then prior scans would not see its updates to counters on
that index.
That does not matter because of the following reason: If a prior scan did see
counter updates on the new index, that means the prior scan would would wait
for the reader when it probably did not need to.
And if the prior scan did see both lock and unlock count updates on that index,
that reader is essentially done, so it is OK to end the grace period.
For this reason, remove the full memory barrier before incrementing
srcu_idx.
6 hours of testing shows all SRCU-* scenarios pass with this.
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
kernel/rcu/srcutree.c | 8 --------
1 file changed, 8 deletions(-)
Comments
On Sun, Dec 18, 2022 at 07:13:09PM +0000, Joel Fernandes (Google) wrote: > During a flip, we have a full memory barrier before idx is incremented. > > The effect of this seems to be to guarantee that, if a READER sees srcu_idx > updates (srcu_flip), then prior scans would not see its updates to counters on > that index. > > That does not matter because of the following reason: If a prior scan did see > counter updates on the new index, that means the prior scan would would wait > for the reader when it probably did not need to. I'm confused, isn't it actually what we want to prevent from? The point of the barrier here is to make sure that the inactive index that we just scanned is guaranteed to remain seen as inactive during the whole scan (minus the possible twice residual increments from a given task that we debated on Paul's patch, but we want the guarantee that the inactive index won't be incremented thrice by a given task or any further while we are scanning it). If some readers see the new index and increments the lock and we see that while we are scanning it, there is a risk that the GP is going to be delayed indefinetly. > @@ -982,14 +982,6 @@ static bool try_check_zero(struct srcu_struct *ssp, int idx, int trycount) > */ > static void srcu_flip(struct srcu_struct *ssp) > { > - /* > - * Ensure that if a given reader sees the new value of ->srcu_idx, this > - * updater's earlier scans cannot have seen that reader's increments > - * (which is OK, because this grace period need not wait on that > - * reader). > - */ > - smp_mb(); /* E */ /* Pairs with B and C. */ That said, I've been starring at this very barrier for the whole day, and I'm wondering what does it match exactly on the other end? UPDATER READER ------- ------ idx = ssp->srcu_idx; idx = srcu_idx; READ srcu_unlock_count[srcu_idx ^ 1] srcu_lock_count[idx]++ smp_mb(); smp_mb(); READ srcu_lock_count[srcu_idx ^ 1] srcu_unlock_count[old_idx]++ smp_mb() srcu_idx++; For a true match, I would expect a barrier between srcu_idx read and srcu_lock_count write. I'm not used to ordering writes after reads. So what is the pattern here? I would expect something like the below but that doesn't match the above: C rwrw {} P0(int *X, int *Y) { int x; x = READ_ONCE(*X); smp_mb(); WRITE_ONCE(*Y, 1); } P1(int *X, int *Y) { int y; y = READ_ONCE(*Y); smp_mb(); WRITE_ONCE(*X, 1); } exists (0:x=1 /\ 1:y=1) > - > WRITE_ONCE(ssp->srcu_idx, ssp->srcu_idx + 1); > > /* > -- > 2.39.0.314.g84b9a713c41-goog >
On Sun, Dec 18, 2022 at 10:42:43PM +0100, Frederic Weisbecker wrote: > On Sun, Dec 18, 2022 at 07:13:09PM +0000, Joel Fernandes (Google) wrote: > > During a flip, we have a full memory barrier before idx is incremented. > > > > The effect of this seems to be to guarantee that, if a READER sees srcu_idx > > updates (srcu_flip), then prior scans would not see its updates to counters on > > that index. > > > > That does not matter because of the following reason: If a prior scan did see > > counter updates on the new index, that means the prior scan would would wait > > for the reader when it probably did not need to. > > I'm confused, isn't it actually what we want to prevent from? > The point of the barrier here is to make sure that the inactive index that > we just scanned is guaranteed to remain seen as inactive during the whole scan > (minus the possible twice residual increments from a given task that we debated > on Paul's patch, but we want the guarantee that the inactive index won't be > incremented thrice by a given task or any further while we are scanning it). I believe you are talking about the memory barrier after the flip, that's the one that guarantees what you are talking about it, I feel. That is, readers see the newly inactivated index eventually, so that we are not scanning indefinitely. For that, we need smp_mb() after the flip but before the second scan which is a much needed memory barrier IMHO, and not what this patch is talking about. > If some readers see the new index and increments the lock and we see that while > we are scanning it, there is a risk that the GP is going to be delayed indefinetly. The "new" index is the index after the flip, do you mean the "old" index? i.e. the index before the flip? That is what barrier E is talking about, not the index after the flip. > > > @@ -982,14 +982,6 @@ static bool try_check_zero(struct srcu_struct *ssp, int idx, int trycount) > > */ > > static void srcu_flip(struct srcu_struct *ssp) > > { > > - /* > > - * Ensure that if a given reader sees the new value of ->srcu_idx, this > > - * updater's earlier scans cannot have seen that reader's increments > > - * (which is OK, because this grace period need not wait on that > > - * reader). > > - */ > > - smp_mb(); /* E */ /* Pairs with B and C. */ > > That said, I've been starring at this very barrier for the whole day, and I'm > wondering what does it match exactly on the other end? > > UPDATER READER > ------- ------ > idx = ssp->srcu_idx; idx = srcu_idx; > READ srcu_unlock_count[srcu_idx ^ 1] srcu_lock_count[idx]++ > smp_mb(); smp_mb(); > READ srcu_lock_count[srcu_idx ^ 1] srcu_unlock_count[old_idx]++ > smp_mb() > srcu_idx++; > > For a true match, I would expect a barrier between srcu_idx read and > srcu_lock_count write. I'm not used to ordering writes after reads. > So what is the pattern here? I would expect something like the below > but that doesn't match the above: IMHO, it is matching updates to index and the lock count of a reader. > > C rwrw > > {} > > > P0(int *X, int *Y) > { > int x; > > x = READ_ONCE(*X); > smp_mb(); > WRITE_ONCE(*Y, 1); > } > > P1(int *X, int *Y) > { > > int y; > > y = READ_ONCE(*Y); > smp_mb(); > WRITE_ONCE(*X, 1); > } > > exists (0:x=1 /\ 1:y=1) Hmm, I guess first lets degenerate the real code to an access pattern: READER UPDATER scanner() { count_all_unlocks(); smp_mb(); count_all_locks(); (Y) } rcu_read_lock() { idx = READ(idx); (X) lock_count[idx]++; smp_mb(); // mb B } rcu_read_unlock() { smp_mb(); // mb C unlock_count[idx]++; } srcu_flip() { smp_mb(); //E idx++; (X) rcu_read_lock() { idx = READ(idx); lock_count[idx]++; (Y) smp_mb(); // mb B smp_mb(); } } That becomes: // READER P0(int *X, int *Y) { int r0; r0 = READ_ONCE(*X); // PP smp_mb(); // B+C // QQ WRITE_ONCE(*Y, 1); // RR } // UPDATER P1(int *X, int *Y) { int r1; r1 = READ_ONCE(*Y); // SS smp_mb(); // E // TT WRITE_ONCE(*X, 1); // UU } Impossible that: exists (0:r0=1 /\ 1: r1:1) Because if r0=1, there is PP ->rf UU relation. So because of the smp_mb(), it is impossible that r1=1. So "E" is saying, if a reader saw new idx, that is the "X" in the litmus test, then previous scan where it count all the locks (SS) cannot see the lock count updates made at the new index. However, that does not matter IMHO because due to preemption after current index is sampled, we have no control anyway over which lock counts are incremented anyway, so this cannot effect correctness. And if forward progress is a problem, we are doing a full memory barrier after the flip anyway so I am not seeing the point of "E". thanks, - Joel
On Sun, Dec 18, 2022 at 6:26 PM Joel Fernandes <joel@joelfernandes.org> wrote: > > On Sun, Dec 18, 2022 at 10:42:43PM +0100, Frederic Weisbecker wrote: > > On Sun, Dec 18, 2022 at 07:13:09PM +0000, Joel Fernandes (Google) wrote: > > > During a flip, we have a full memory barrier before idx is incremented. > > > > > > The effect of this seems to be to guarantee that, if a READER sees srcu_idx > > > updates (srcu_flip), then prior scans would not see its updates to counters on > > > that index. > > > > > > That does not matter because of the following reason: If a prior scan did see > > > counter updates on the new index, that means the prior scan would would wait > > > for the reader when it probably did not need to. > > > > I'm confused, isn't it actually what we want to prevent from? > > The point of the barrier here is to make sure that the inactive index that > > we just scanned is guaranteed to remain seen as inactive during the whole scan > > (minus the possible twice residual increments from a given task that we debated > > on Paul's patch, but we want the guarantee that the inactive index won't be > > incremented thrice by a given task or any further while we are scanning it). > > I believe you are talking about the memory barrier after the flip, that's the > one that guarantees what you are talking about it, I feel. That is, readers > see the newly inactivated index eventually, so that we are not scanning > indefinitely. > > For that, we need smp_mb() after the flip but before the second scan which is > a much needed memory barrier IMHO, and not what this patch is talking about. > > > If some readers see the new index and increments the lock and we see that while > > we are scanning it, there is a risk that the GP is going to be delayed indefinetly. > > The "new" index is the index after the flip, do you mean the "old" index? > i.e. the index before the flip? That is what barrier E is talking about, not > the index after the flip. > > > > > > @@ -982,14 +982,6 @@ static bool try_check_zero(struct srcu_struct *ssp, int idx, int trycount) > > > */ > > > static void srcu_flip(struct srcu_struct *ssp) > > > { > > > - /* > > > - * Ensure that if a given reader sees the new value of ->srcu_idx, this > > > - * updater's earlier scans cannot have seen that reader's increments > > > - * (which is OK, because this grace period need not wait on that > > > - * reader). > > > - */ > > > - smp_mb(); /* E */ /* Pairs with B and C. */ > > > > That said, I've been starring at this very barrier for the whole day, and I'm > > wondering what does it match exactly on the other end? > > > > UPDATER READER > > ------- ------ > > idx = ssp->srcu_idx; idx = srcu_idx; > > READ srcu_unlock_count[srcu_idx ^ 1] srcu_lock_count[idx]++ > > smp_mb(); smp_mb(); > > READ srcu_lock_count[srcu_idx ^ 1] srcu_unlock_count[old_idx]++ > > smp_mb() > > srcu_idx++; > > > > For a true match, I would expect a barrier between srcu_idx read and > > srcu_lock_count write. I'm not used to ordering writes after reads. > > So what is the pattern here? I would expect something like the below > > but that doesn't match the above: > > IMHO, it is matching updates to index and the lock count of a reader. > > > > > C rwrw > > > > {} > > > > > > P0(int *X, int *Y) > > { > > int x; > > > > x = READ_ONCE(*X); > > smp_mb(); > > WRITE_ONCE(*Y, 1); > > } > > > > P1(int *X, int *Y) > > { > > > > int y; > > > > y = READ_ONCE(*Y); > > smp_mb(); > > WRITE_ONCE(*X, 1); > > } > > > > exists (0:x=1 /\ 1:y=1) > > Hmm, I guess first lets degenerate the real code to an access pattern: > > > READER UPDATER > > scanner() { > count_all_unlocks(); > smp_mb(); > count_all_locks(); (Y) > } > > rcu_read_lock() { > idx = READ(idx); (X) > lock_count[idx]++; > > smp_mb(); // mb B > } > > rcu_read_unlock() { > smp_mb(); // mb C > unlock_count[idx]++; > } > srcu_flip() { > smp_mb(); //E > idx++; (X) > rcu_read_lock() { > idx = READ(idx); > lock_count[idx]++; (Y) > > smp_mb(); // mb B > smp_mb(); > } > } > > > That becomes: > > // READER > P0(int *X, int *Y) > { > int r0; > > r0 = READ_ONCE(*X); // PP > smp_mb(); // B+C // QQ > WRITE_ONCE(*Y, 1); // RR > } > > // UPDATER > P1(int *X, int *Y) > { > int r1; > > r1 = READ_ONCE(*Y); // SS > smp_mb(); // E // TT > WRITE_ONCE(*X, 1); // UU > } > > Impossible that: > exists (0:r0=1 /\ 1: r1:1) > > Because if r0=1, there is PP ->rf UU relation. So because of the smp_mb(), it > is impossible that r1=1. > > So "E" is saying, if a reader saw new idx, that is the "X" in the litmus > test, then previous scan where it count all the locks (SS) cannot see the > lock count updates made at the new index. > > However, that does not matter IMHO because due to preemption after current > index is sampled, we have no control anyway over which lock counts are > incremented anyway, so this cannot effect correctness. > > And if forward progress is a problem, we are doing a full memory barrier > after the flip anyway so I am not seeing the point of "E". And you made me realize that the previous scan (the one that happened before the flip) does not care about lock count on the "new" idx value (because, duh, in program order, the previous scan was scanning the old pre-flip idx0, and if we go for scans before that, we end up running into smp_mb() in the previous scan which is plenty enough. So I am still not seeing the purpose that "E" serves, as far as it concerns the comment that this patch deletes. Anyway, phew, time for a break ;-) Thanks, - Joel
diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c index d6a4c2439ca6..2d2e6d304a43 100644 --- a/kernel/rcu/srcutree.c +++ b/kernel/rcu/srcutree.c @@ -982,14 +982,6 @@ static bool try_check_zero(struct srcu_struct *ssp, int idx, int trycount) */ static void srcu_flip(struct srcu_struct *ssp) { - /* - * Ensure that if a given reader sees the new value of ->srcu_idx, this - * updater's earlier scans cannot have seen that reader's increments - * (which is OK, because this grace period need not wait on that - * reader). - */ - smp_mb(); /* E */ /* Pairs with B and C. */ - WRITE_ONCE(ssp->srcu_idx, ssp->srcu_idx + 1); /*