Message ID | 20230803032408.2514989-1-joel@joelfernandes.org |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:9f41:0:b0:3e4:2afc:c1 with SMTP id v1csp890954vqx; Wed, 2 Aug 2023 20:51:12 -0700 (PDT) X-Google-Smtp-Source: APBJJlFaPmB6OWv7S5UjSkj1574C6n8y1LjWX5B4zSXhAfxKkkNhaBQ+R7HPC7CF5z1avL+XF+3t X-Received: by 2002:a17:902:e810:b0:1b8:5827:8763 with SMTP id u16-20020a170902e81000b001b858278763mr18540685plg.4.1691034672304; Wed, 02 Aug 2023 20:51:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691034672; cv=none; d=google.com; s=arc-20160816; b=BrMoWg4OuJ1LmhGHZOOPSfcpBnUsE+cr6e69hD0f30QLPUI5X+6GFgd926yQR1QHkA 1lVWeWuuStxGHBmATTwLbeu3qYecZMC/vpGH8KJ4pv9SvhEhZjgYNV0yKp71T0uEmsK0 WdFLo8hO6QgSxPQMhpiuXyWQ6dYave/oVEdN3uPDUVMcIUGrHWLXitABDvb3d5biLs3U PGZ94BJRl/cHFXxXTYlcpItMHL1Xl2PeusFJQxX3GT0zqso45HqER5tFN0VKAE4TFd+7 LTCKtN1+tIWh87BTiujDPqB+Rzm1mmOTeULAm6UvCBD8eUV2qWXWJCi6gGPdu+mhI4U4 jMTQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=ri+AV8Kf+G5whax+bk9HTGfLE0fpaHCunlBX3LcI+MY=; fh=Cz95V29EXGwqB37WUOXxHjkcuI2FsbVcP5QmsorGYYo=; b=z+kDyI5i1CLyQlJq9plA/9mHPvOVWUbxE6m8VoXZjEJLUgpbYUbmuFwGpkX4XMpL09 zGUNBQHwQpPQEreykvW5pyM8G/JXHtYBRpNwiGlqM6EjXwucfodVbJYhHtxc99xj+dZ8 a8MYifPranWuyjudnbCIIJLCP+auz6Bxka4DRL/kiGu6A+aAe8shiULnNhjhUOMc1lfF ZPEBoNo6L9Nhv5nN2sF3zVKJH8q3UomuHZt29c1c7E8wBInvbXEg98NzPKeavGqx7csW 5seefQWyGkVaF5Yh9/K0ZB0R4YEbcDQKpahLRuRrmmStPtwCKzn3JvMakNJSWjy64RoS OoIg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=SSROfGyq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u9-20020a170902e80900b001b3eeaad177si12267701plg.99.2023.08.02.20.50.58; Wed, 02 Aug 2023 20:51:12 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=SSROfGyq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233423AbjHCDY1 (ORCPT <rfc822;cambridge8321@gmail.com> + 99 others); Wed, 2 Aug 2023 23:24:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50314 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233010AbjHCDYR (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 2 Aug 2023 23:24:17 -0400 Received: from mail-io1-xd30.google.com (mail-io1-xd30.google.com [IPv6:2607:f8b0:4864:20::d30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9B8902D43 for <linux-kernel@vger.kernel.org>; Wed, 2 Aug 2023 20:24:14 -0700 (PDT) Received: by mail-io1-xd30.google.com with SMTP id ca18e2360f4ac-786bb09e595so41271139f.1 for <linux-kernel@vger.kernel.org>; Wed, 02 Aug 2023 20:24:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; t=1691033053; x=1691637853; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=ri+AV8Kf+G5whax+bk9HTGfLE0fpaHCunlBX3LcI+MY=; b=SSROfGyqV9e3u74mTKmoa9swo8SFey1Lh0VpFbL1JE70n1rclCj8tC30xw4OMGN2vB +IlNzwUN30w4Nw4O0wOw4zYh/3Z5xy5EI6IPgTWrmkRvfYDNsjzL6uSm3a0h4mlH2GOM 10aW6NWacR4dbgGPHJmVb5bJhYKUm3LEhZML0= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691033053; x=1691637853; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ri+AV8Kf+G5whax+bk9HTGfLE0fpaHCunlBX3LcI+MY=; b=aL4VZ24SpnnHqIzHtOQMUL/QXd+Clah9+OfSHe47z9clgfVmIa9XBOo1DR7oi+GbEV T2O+iHy02smmzeOpOmG+414hmRy6w2P0y247+zVus8o95qvB03066P9dUar0Tf+gqe0O dDeyEz4AfNCRby9Q84gQzddnRG6i1Yy/LEdJnlNOok/mcqOwsOZIVIwsMGHTeL8II+82 Lem+gkIguwxtySUOWc1hsc9rxJOuu3zgCKE1gbj/jQMDJDrguGgYsgwjEDWWswpuqVtZ UJZCcD8ZiDQQdKXkh5a19eLVQvMzS+8cg3V5ILTYI+nV3mNL42fhql5pQqD95RozKEhH Y7rQ== X-Gm-Message-State: ABy/qLZwen31P2waKRCRfZfNrOW7m5K5pD3YU8VawgFa8Xp2CSvfeMj+ hri0lnwH8Sf7CtEbDjZD36JcWACbbziQWo9z+CQ= X-Received: by 2002:a05:6602:2559:b0:785:cac9:2d49 with SMTP id cg25-20020a056602255900b00785cac92d49mr14841576iob.1.1691033053197; Wed, 02 Aug 2023 20:24:13 -0700 (PDT) Received: from joelboxx5.c.googlers.com.com (156.190.123.34.bc.googleusercontent.com. [34.123.190.156]) by smtp.gmail.com with ESMTPSA id a14-20020a02ac0e000000b0042b67b12363sm4535176jao.37.2023.08.02.20.24.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 02 Aug 2023 20:24:12 -0700 (PDT) From: "Joel Fernandes (Google)" <joel@joelfernandes.org> To: linux-kernel@vger.kernel.org, rcu@vger.kernel.org Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>, Will Deacon <will@kernel.org>, "Paul E. McKenney" <paulmck@kernel.org>, Frederic Weisbecker <frederic@kernel.org>, Neeraj Upadhyay <quic_neeraju@quicinc.com>, Josh Triplett <josh@joshtriplett.org>, Boqun Feng <boqun.feng@gmail.com>, Steven Rostedt <rostedt@goodmis.org>, Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, Lai Jiangshan <jiangshanlai@gmail.com>, Zqiang <qiang.zhang1211@gmail.com>, Jonathan Corbet <corbet@lwn.net> Subject: [PATCH 1/2] docs: rcu: Add cautionary note on plain-accesses to requirements Date: Thu, 3 Aug 2023 03:24:06 +0000 Message-ID: <20230803032408.2514989-1-joel@joelfernandes.org> X-Mailer: git-send-email 2.41.0.585.gd2178a4bd4-goog MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773178372802110114 X-GMAIL-MSGID: 1773178372802110114 |
Series |
[1/2] docs: rcu: Add cautionary note on plain-accesses to requirements
|
|
Commit Message
Joel Fernandes
Aug. 3, 2023, 3:24 a.m. UTC
Add a detailed note to explain the potential side effects of
plain-accessing the gp pointer using a plain load, without using the
rcu_dereference() macros; which might trip neighboring code that does
use rcu_dereference().
I haven't verified this with a compiler, but this is what I gather from
the below link using Will's experience with READ_ONCE().
Link: https://lore.kernel.org/all/20230728124412.GA21303@willie-the-truck/
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
.../RCU/Design/Requirements/Requirements.rst | 32 +++++++++++++++++++
1 file changed, 32 insertions(+)
Comments
> 2023年8月3日 11:24,Joel Fernandes (Google) <joel@joelfernandes.org> 写道: > > Add a detailed note to explain the potential side effects of > plain-accessing the gp pointer using a plain load, without using the > rcu_dereference() macros; which might trip neighboring code that does > use rcu_dereference(). > > I haven't verified this with a compiler, but this is what I gather from > the below link using Will's experience with READ_ONCE(). > > Link: https://lore.kernel.org/all/20230728124412.GA21303@willie-the-truck/ > Cc: Will Deacon <will@kernel.org> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > --- > .../RCU/Design/Requirements/Requirements.rst | 32 +++++++++++++++++++ > 1 file changed, 32 insertions(+) > > diff --git a/Documentation/RCU/Design/Requirements/Requirements.rst b/Documentation/RCU/Design/Requirements/Requirements.rst > index f3b605285a87..e0b896d3fb9b 100644 > --- a/Documentation/RCU/Design/Requirements/Requirements.rst > +++ b/Documentation/RCU/Design/Requirements/Requirements.rst > @@ -376,6 +376,38 @@ mechanism, most commonly locking or reference counting > .. |high-quality implementation of C11 memory_order_consume [PDF]| replace:: high-quality implementation of C11 ``memory_order_consume`` [PDF] > .. _high-quality implementation of C11 memory_order_consume [PDF]: http://www.rdrop.com/users/paulmck/RCU/consume.2015.07.13a.pdf > > +Note that, there can be strange side effects (due to compiler optimizations) if > +``gp`` is ever accessed using a plain load (i.e. without ``READ_ONCE()`` or > +``rcu_dereference()``) potentially hurting any succeeding > +``rcu_dereference()``. For example, consider the code: > + > + :: > + > + 1 bool do_something_gp(void) > + 2 { > + 3 void *tmp; > + 4 rcu_read_lock(); > + 5 tmp = gp; // Plain-load of GP. > + 6 printk("Point gp = %p\n", tmp); > + 7 > + 8 p = rcu_dereference(gp); > + 9 if (p) { > + 10 do_something(p->a, p->b); > + 11 rcu_read_unlock(); > + 12 return true; > + 13 } > + 14 rcu_read_unlock(); > + 15 return false; > + 16 } > + > +The behavior of plain accesses involved in a data race is non-deterministic in > +the face of compiler optimizations. Since accesses to the ``gp`` pointer is > +by-design a data race, the compiler could trip this code by caching the value > +of ``gp`` into a register in line 5, and then using the value of the register > +to satisfy the load in line 10. Thus it is important to never mix Will’s example is: // Assume *ptr is initially 0 and somebody else writes it to 1 // concurrently foo = *ptr; bar = READ_ONCE(*ptr); baz = *ptr; Then the compiler is within its right to reorder it to: foo = *ptr; baz = *ptr; bar = READ_ONCE(*ptr); So, the result foo == baz == 0 but bar == 1 is perfectly legal. But the example here is different, the compiler can not use the value loaded from line 5 unless the compiler can deduce that the tmp is equals to p in which case the address dependency doesn’t exist anymore. What am I missing here? > +plain accesses of a memory location with rcu_dereference() of the same memory > +location, in code involved in a data race. > + > In short, updaters use rcu_assign_pointer() and readers use > rcu_dereference(), and these two RCU API elements work together to > ensure that readers have a consistent view of newly added data elements. > -- > 2.41.0.585.gd2178a4bd4-goog >
> On Aug 3, 2023, at 8:09 AM, Alan Huang <mmpgouride@gmail.com> wrote: > > >> 2023年8月3日 11:24,Joel Fernandes (Google) <joel@joelfernandes.org> 写道: >> >> Add a detailed note to explain the potential side effects of >> plain-accessing the gp pointer using a plain load, without using the >> rcu_dereference() macros; which might trip neighboring code that does >> use rcu_dereference(). >> >> I haven't verified this with a compiler, but this is what I gather from >> the below link using Will's experience with READ_ONCE(). >> >> Link: https://lore.kernel.org/all/20230728124412.GA21303@willie-the-truck/ >> Cc: Will Deacon <will@kernel.org> >> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> >> --- >> .../RCU/Design/Requirements/Requirements.rst | 32 +++++++++++++++++++ >> 1 file changed, 32 insertions(+) >> >> diff --git a/Documentation/RCU/Design/Requirements/Requirements.rst b/Documentation/RCU/Design/Requirements/Requirements.rst >> index f3b605285a87..e0b896d3fb9b 100644 >> --- a/Documentation/RCU/Design/Requirements/Requirements.rst >> +++ b/Documentation/RCU/Design/Requirements/Requirements.rst >> @@ -376,6 +376,38 @@ mechanism, most commonly locking or reference counting >> .. |high-quality implementation of C11 memory_order_consume [PDF]| replace:: high-quality implementation of C11 ``memory_order_consume`` [PDF] >> .. _high-quality implementation of C11 memory_order_consume [PDF]: http://www.rdrop.com/users/paulmck/RCU/consume.2015.07.13a.pdf >> >> +Note that, there can be strange side effects (due to compiler optimizations) if >> +``gp`` is ever accessed using a plain load (i.e. without ``READ_ONCE()`` or >> +``rcu_dereference()``) potentially hurting any succeeding >> +``rcu_dereference()``. For example, consider the code: >> + >> + :: >> + >> + 1 bool do_something_gp(void) >> + 2 { >> + 3 void *tmp; >> + 4 rcu_read_lock(); >> + 5 tmp = gp; // Plain-load of GP. >> + 6 printk("Point gp = %p\n", tmp); >> + 7 >> + 8 p = rcu_dereference(gp); >> + 9 if (p) { >> + 10 do_something(p->a, p->b); >> + 11 rcu_read_unlock(); >> + 12 return true; >> + 13 } >> + 14 rcu_read_unlock(); >> + 15 return false; >> + 16 } >> + >> +The behavior of plain accesses involved in a data race is non-deterministic in >> +the face of compiler optimizations. Since accesses to the ``gp`` pointer is >> +by-design a data race, the compiler could trip this code by caching the value >> +of ``gp`` into a register in line 5, and then using the value of the register >> +to satisfy the load in line 10. Thus it is important to never mix > > Will’s example is: > > // Assume *ptr is initially 0 and somebody else writes it to 1 > // concurrently > > foo = *ptr; > bar = READ_ONCE(*ptr); > baz = *ptr; > > Then the compiler is within its right to reorder it to: > > foo = *ptr; > baz = *ptr; > bar = READ_ONCE(*ptr); > > So, the result foo == baz == 0 but bar == 1 is perfectly legal. Yes, a bad outcome is perfectly legal amidst data race. Who said it is not legal? > > But the example here is different, That is intentional. Wills discussion partially triggered this. Though I am wondering if we should document that as well. > the compiler can not use the value loaded from line 5 > unless the compiler can deduce that the tmp is equals to p in which case the address dependency > doesn’t exist anymore. > > What am I missing here? Maybe you are trying to rationalize too much that the sequence mentioned cannot result in a counter intuitive outcome like I did? The point AFAIU is not just about line 10 but that the compiler can replace any of the lines after the plain access with the cached value. Thanks. > >> +plain accesses of a memory location with rcu_dereference() of the same memory >> +location, in code involved in a data race. >> + >> In short, updaters use rcu_assign_pointer() and readers use >> rcu_dereference(), and these two RCU API elements work together to >> ensure that readers have a consistent view of newly added data elements. >> -- >> 2.41.0.585.gd2178a4bd4-goog >> >
> 2023年8月3日 下午8:35,Joel Fernandes <joel@joelfernandes.org> 写道: > > > >> On Aug 3, 2023, at 8:09 AM, Alan Huang <mmpgouride@gmail.com> wrote: >> >> >>> 2023年8月3日 11:24,Joel Fernandes (Google) <joel@joelfernandes.org> 写道: >>> >>> Add a detailed note to explain the potential side effects of >>> plain-accessing the gp pointer using a plain load, without using the >>> rcu_dereference() macros; which might trip neighboring code that does >>> use rcu_dereference(). >>> >>> I haven't verified this with a compiler, but this is what I gather from >>> the below link using Will's experience with READ_ONCE(). >>> >>> Link: https://lore.kernel.org/all/20230728124412.GA21303@willie-the-truck/ >>> Cc: Will Deacon <will@kernel.org> >>> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> >>> --- >>> .../RCU/Design/Requirements/Requirements.rst | 32 +++++++++++++++++++ >>> 1 file changed, 32 insertions(+) >>> >>> diff --git a/Documentation/RCU/Design/Requirements/Requirements.rst b/Documentation/RCU/Design/Requirements/Requirements.rst >>> index f3b605285a87..e0b896d3fb9b 100644 >>> --- a/Documentation/RCU/Design/Requirements/Requirements.rst >>> +++ b/Documentation/RCU/Design/Requirements/Requirements.rst >>> @@ -376,6 +376,38 @@ mechanism, most commonly locking or reference counting >>> .. |high-quality implementation of C11 memory_order_consume [PDF]| replace:: high-quality implementation of C11 ``memory_order_consume`` [PDF] >>> .. _high-quality implementation of C11 memory_order_consume [PDF]: http://www.rdrop.com/users/paulmck/RCU/consume.2015.07.13a.pdf >>> >>> +Note that, there can be strange side effects (due to compiler optimizations) if >>> +``gp`` is ever accessed using a plain load (i.e. without ``READ_ONCE()`` or >>> +``rcu_dereference()``) potentially hurting any succeeding >>> +``rcu_dereference()``. For example, consider the code: >>> + >>> + :: >>> + >>> + 1 bool do_something_gp(void) >>> + 2 { >>> + 3 void *tmp; >>> + 4 rcu_read_lock(); >>> + 5 tmp = gp; // Plain-load of GP. >>> + 6 printk("Point gp = %p\n", tmp); >>> + 7 >>> + 8 p = rcu_dereference(gp); >>> + 9 if (p) { >>> + 10 do_something(p->a, p->b); >>> + 11 rcu_read_unlock(); >>> + 12 return true; >>> + 13 } >>> + 14 rcu_read_unlock(); >>> + 15 return false; >>> + 16 } >>> + >>> +The behavior of plain accesses involved in a data race is non-deterministic in >>> +the face of compiler optimizations. Since accesses to the ``gp`` pointer is >>> +by-design a data race, the compiler could trip this code by caching the value >>> +of ``gp`` into a register in line 5, and then using the value of the register >>> +to satisfy the load in line 10. Thus it is important to never mix >> >> Will’s example is: >> >> // Assume *ptr is initially 0 and somebody else writes it to 1 >> // concurrently >> >> foo = *ptr; >> bar = READ_ONCE(*ptr); >> baz = *ptr; >> >> Then the compiler is within its right to reorder it to: >> >> foo = *ptr; >> baz = *ptr; >> bar = READ_ONCE(*ptr); >> >> So, the result foo == baz == 0 but bar == 1 is perfectly legal. > > Yes, a bad outcome is perfectly legal amidst data race. Who said it is not legal? My understanding is that it is legal even without data race, and the compiler only keeps the order of volatile access. > >> >> But the example here is different, > > That is intentional. Wills discussion partially triggered this. Though I am wondering > if we should document that as well. > >> the compiler can not use the value loaded from line 5 >> unless the compiler can deduce that the tmp is equals to p in which case the address dependency >> doesn’t exist anymore. >> >> What am I missing here? > > Maybe you are trying to rationalize too much that the sequence mentioned cannot result > in a counter intuitive outcome like I did? > > The point AFAIU is not just about line 10 but that the compiler can replace any of the > lines after the plain access with the cached value. Well, IIUC, according to the C standard, the compiler can do anything if there is a data race (undefined behavior). However, what if a write is not protected with WRITE_ONCE and the read is marked with READ_ONCE? That’s also a data race, right? But the kernel considers it is Okay if the write is machine word aligned. > > Thanks. > > > >> >>> +plain accesses of a memory location with rcu_dereference() of the same memory >>> +location, in code involved in a data race. >>> + >>> In short, updaters use rcu_assign_pointer() and readers use >>> rcu_dereference(), and these two RCU API elements work together to >>> ensure that readers have a consistent view of newly added data elements. >>> -- >>> 2.41.0.585.gd2178a4bd4-goog
On Thu, Aug 3, 2023 at 9:36 AM Alan Huang <mmpgouride@gmail.com> wrote: > > > > 2023年8月3日 下午8:35,Joel Fernandes <joel@joelfernandes.org> 写道: > > > > > > > >> On Aug 3, 2023, at 8:09 AM, Alan Huang <mmpgouride@gmail.com> wrote: > >> > >> > >>> 2023年8月3日 11:24,Joel Fernandes (Google) <joel@joelfernandes.org> 写道: > >>> > >>> Add a detailed note to explain the potential side effects of > >>> plain-accessing the gp pointer using a plain load, without using the > >>> rcu_dereference() macros; which might trip neighboring code that does > >>> use rcu_dereference(). > >>> > >>> I haven't verified this with a compiler, but this is what I gather from > >>> the below link using Will's experience with READ_ONCE(). > >>> > >>> Link: https://lore.kernel.org/all/20230728124412.GA21303@willie-the-truck/ > >>> Cc: Will Deacon <will@kernel.org> > >>> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > >>> --- > >>> .../RCU/Design/Requirements/Requirements.rst | 32 +++++++++++++++++++ > >>> 1 file changed, 32 insertions(+) > >>> > >>> diff --git a/Documentation/RCU/Design/Requirements/Requirements.rst b/Documentation/RCU/Design/Requirements/Requirements.rst > >>> index f3b605285a87..e0b896d3fb9b 100644 > >>> --- a/Documentation/RCU/Design/Requirements/Requirements.rst > >>> +++ b/Documentation/RCU/Design/Requirements/Requirements.rst > >>> @@ -376,6 +376,38 @@ mechanism, most commonly locking or reference counting > >>> .. |high-quality implementation of C11 memory_order_consume [PDF]| replace:: high-quality implementation of C11 ``memory_order_consume`` [PDF] > >>> .. _high-quality implementation of C11 memory_order_consume [PDF]: http://www.rdrop.com/users/paulmck/RCU/consume.2015.07.13a.pdf > >>> > >>> +Note that, there can be strange side effects (due to compiler optimizations) if > >>> +``gp`` is ever accessed using a plain load (i.e. without ``READ_ONCE()`` or > >>> +``rcu_dereference()``) potentially hurting any succeeding > >>> +``rcu_dereference()``. For example, consider the code: > >>> + > >>> + :: > >>> + > >>> + 1 bool do_something_gp(void) > >>> + 2 { > >>> + 3 void *tmp; > >>> + 4 rcu_read_lock(); > >>> + 5 tmp = gp; // Plain-load of GP. > >>> + 6 printk("Point gp = %p\n", tmp); > >>> + 7 > >>> + 8 p = rcu_dereference(gp); > >>> + 9 if (p) { > >>> + 10 do_something(p->a, p->b); > >>> + 11 rcu_read_unlock(); > >>> + 12 return true; > >>> + 13 } > >>> + 14 rcu_read_unlock(); > >>> + 15 return false; > >>> + 16 } > >>> + > >>> +The behavior of plain accesses involved in a data race is non-deterministic in > >>> +the face of compiler optimizations. Since accesses to the ``gp`` pointer is > >>> +by-design a data race, the compiler could trip this code by caching the value > >>> +of ``gp`` into a register in line 5, and then using the value of the register > >>> +to satisfy the load in line 10. Thus it is important to never mix > >> > >> Will’s example is: > >> > >> // Assume *ptr is initially 0 and somebody else writes it to 1 > >> // concurrently > >> > >> foo = *ptr; > >> bar = READ_ONCE(*ptr); > >> baz = *ptr; > >> > >> Then the compiler is within its right to reorder it to: > >> > >> foo = *ptr; > >> baz = *ptr; > >> bar = READ_ONCE(*ptr); > >> > >> So, the result foo == baz == 0 but bar == 1 is perfectly legal. > > > > Yes, a bad outcome is perfectly legal amidst data race. Who said it is not legal? > > My understanding is that it is legal even without data race, and the compiler only keeps the order of volatile access. Yes, but I can bet on it the author of the code would not have intended such an outcome, if they did then Will wouldn't have been debugging it ;-). That's why I called it a bad outcome. The goal of this patch is to document such a possible unintentional outcome. > >> But the example here is different, > > > > That is intentional. Wills discussion partially triggered this. Though I am wondering > > if we should document that as well. > > > >> the compiler can not use the value loaded from line 5 > >> unless the compiler can deduce that the tmp is equals to p in which case the address dependency > >> doesn’t exist anymore. > >> > >> What am I missing here? > > > > Maybe you are trying to rationalize too much that the sequence mentioned cannot result > > in a counter intuitive outcome like I did? > > > > The point AFAIU is not just about line 10 but that the compiler can replace any of the > > lines after the plain access with the cached value. > > Well, IIUC, according to the C standard, the compiler can do anything if there is a data race (undefined behavior). > > However, what if a write is not protected with WRITE_ONCE and the read is marked with READ_ONCE? > That’s also a data race, right? But the kernel considers it is Okay if the write is machine word aligned. Yes, but there is a compiler between the HLL code and what the processor sees which can tear the write. How can not using WRITE_ONCE() prevent store-tearing? See [1]. My understanding is that it is OK only if the reader did a NULL check. In that case the torn result will not change the semantics of the program. But otherwise, that's bad. [1] https://lwn.net/Articles/793253/#Store%20Tearing thanks, - Joel > > > > > Thanks. > > > > > > > >> > >>> +plain accesses of a memory location with rcu_dereference() of the same memory > >>> +location, in code involved in a data race. > >>> + > >>> In short, updaters use rcu_assign_pointer() and readers use > >>> rcu_dereference(), and these two RCU API elements work together to > >>> ensure that readers have a consistent view of newly added data elements. > >>> -- > >>> 2.41.0.585.gd2178a4bd4-goog >
> 2023年8月4日 00:01,Joel Fernandes <joel@joelfernandes.org> 写道: > > On Thu, Aug 3, 2023 at 9:36 AM Alan Huang <mmpgouride@gmail.com> wrote: >> >> >>> 2023年8月3日 下午8:35,Joel Fernandes <joel@joelfernandes.org> 写道: >>> >>> >>> >>>> On Aug 3, 2023, at 8:09 AM, Alan Huang <mmpgouride@gmail.com> wrote: >>>> >>>> >>>>> 2023年8月3日 11:24,Joel Fernandes (Google) <joel@joelfernandes.org> 写道: >>>>> >>>>> Add a detailed note to explain the potential side effects of >>>>> plain-accessing the gp pointer using a plain load, without using the >>>>> rcu_dereference() macros; which might trip neighboring code that does >>>>> use rcu_dereference(). >>>>> >>>>> I haven't verified this with a compiler, but this is what I gather from >>>>> the below link using Will's experience with READ_ONCE(). >>>>> >>>>> Link: https://lore.kernel.org/all/20230728124412.GA21303@willie-the-truck/ >>>>> Cc: Will Deacon <will@kernel.org> >>>>> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> >>>>> --- >>>>> .../RCU/Design/Requirements/Requirements.rst | 32 +++++++++++++++++++ >>>>> 1 file changed, 32 insertions(+) >>>>> >>>>> diff --git a/Documentation/RCU/Design/Requirements/Requirements.rst b/Documentation/RCU/Design/Requirements/Requirements.rst >>>>> index f3b605285a87..e0b896d3fb9b 100644 >>>>> --- a/Documentation/RCU/Design/Requirements/Requirements.rst >>>>> +++ b/Documentation/RCU/Design/Requirements/Requirements.rst >>>>> @@ -376,6 +376,38 @@ mechanism, most commonly locking or reference counting >>>>> .. |high-quality implementation of C11 memory_order_consume [PDF]| replace:: high-quality implementation of C11 ``memory_order_consume`` [PDF] >>>>> .. _high-quality implementation of C11 memory_order_consume [PDF]: http://www.rdrop.com/users/paulmck/RCU/consume.2015.07.13a.pdf >>>>> >>>>> +Note that, there can be strange side effects (due to compiler optimizations) if >>>>> +``gp`` is ever accessed using a plain load (i.e. without ``READ_ONCE()`` or >>>>> +``rcu_dereference()``) potentially hurting any succeeding >>>>> +``rcu_dereference()``. For example, consider the code: >>>>> + >>>>> + :: >>>>> + >>>>> + 1 bool do_something_gp(void) >>>>> + 2 { >>>>> + 3 void *tmp; >>>>> + 4 rcu_read_lock(); >>>>> + 5 tmp = gp; // Plain-load of GP. >>>>> + 6 printk("Point gp = %p\n", tmp); >>>>> + 7 >>>>> + 8 p = rcu_dereference(gp); >>>>> + 9 if (p) { >>>>> + 10 do_something(p->a, p->b); >>>>> + 11 rcu_read_unlock(); >>>>> + 12 return true; >>>>> + 13 } >>>>> + 14 rcu_read_unlock(); >>>>> + 15 return false; >>>>> + 16 } >>>>> + >>>>> +The behavior of plain accesses involved in a data race is non-deterministic in >>>>> +the face of compiler optimizations. Since accesses to the ``gp`` pointer is >>>>> +by-design a data race, the compiler could trip this code by caching the value >>>>> +of ``gp`` into a register in line 5, and then using the value of the register >>>>> +to satisfy the load in line 10. Thus it is important to never mix >>>> >>>> Will’s example is: >>>> >>>> // Assume *ptr is initially 0 and somebody else writes it to 1 >>>> // concurrently >>>> >>>> foo = *ptr; >>>> bar = READ_ONCE(*ptr); >>>> baz = *ptr; >>>> >>>> Then the compiler is within its right to reorder it to: >>>> >>>> foo = *ptr; >>>> baz = *ptr; >>>> bar = READ_ONCE(*ptr); >>>> >>>> So, the result foo == baz == 0 but bar == 1 is perfectly legal. >>> >>> Yes, a bad outcome is perfectly legal amidst data race. Who said it is not legal? >> >> My understanding is that it is legal even without data race, and the compiler only keeps the order of volatile access. > > Yes, but I can bet on it the author of the code would not have > intended such an outcome, if they did then Will wouldn't have been > debugging it ;-). That's why I called it a bad outcome. The goal of > this patch is to document such a possible unintentional outcome. > >>>> But the example here is different, >>> >>> That is intentional. Wills discussion partially triggered this. Though I am wondering >>> if we should document that as well. >>> >>>> the compiler can not use the value loaded from line 5 >>>> unless the compiler can deduce that the tmp is equals to p in which case the address dependency >>>> doesn’t exist anymore. >>>> >>>> What am I missing here? >>> >>> Maybe you are trying to rationalize too much that the sequence mentioned cannot result >>> in a counter intuitive outcome like I did? >>> >>> The point AFAIU is not just about line 10 but that the compiler can replace any of the >>> lines after the plain access with the cached value. >> >> Well, IIUC, according to the C standard, the compiler can do anything if there is a data race (undefined behavior). >> >> However, what if a write is not protected with WRITE_ONCE and the read is marked with READ_ONCE? >> That’s also a data race, right? But the kernel considers it is Okay if the write is machine word aligned. > > Yes, but there is a compiler between the HLL code and what the > processor sees which can tear the write. How can not using > WRITE_ONCE() prevent store-tearing? See [1]. My understanding is that > it is OK only if the reader did a NULL check. In that case the torn Yes, a write-write data race where the value is the same is also fine. But they are still data race, if the compiler is within its right to do anything it likes (due to data race), we still need WRITE_ONCE() in these cases, though it’s semantically safe. IIUC, even with _ONCE(), the compiler is within its right do anything according to the standard (at least before the upcoming C23), because the standard doesn’t consider a volatile access to be atomic. However, the kernel consider the volatile access to be atomic, right? BTW, line 5 in the example is likely to be optimized away. And yes, the compiler can cache the value loaded from line 5 from the perspective of undefined behavior, even if I believe it would be a compiler bug from the perspective of kernel. > result will not change the semantics of the program. But otherwise, > that's bad. > > [1] https://lwn.net/Articles/793253/#Store%20Tearing > > thanks, > > - Joel > > >> >>> >>> Thanks. >>> >>> >>> >>>> >>>>> +plain accesses of a memory location with rcu_dereference() of the same memory >>>>> +location, in code involved in a data race. >>>>> + >>>>> In short, updaters use rcu_assign_pointer() and readers use >>>>> rcu_dereference(), and these two RCU API elements work together to >>>>> ensure that readers have a consistent view of newly added data elements. >>>>> -- >>>>> 2.41.0.585.gd2178a4bd4-goog
> On Aug 3, 2023, at 3:26 PM, Alan Huang <mmpgouride@gmail.com> wrote: > > >> 2023年8月4日 00:01,Joel Fernandes <joel@joelfernandes.org> 写道: >> >>> On Thu, Aug 3, 2023 at 9:36 AM Alan Huang <mmpgouride@gmail.com> wrote: >>> >>> >>>> 2023年8月3日 下午8:35,Joel Fernandes <joel@joelfernandes.org> 写道: >>>> >>>> >>>> >>>>> On Aug 3, 2023, at 8:09 AM, Alan Huang <mmpgouride@gmail.com> wrote: >>>>> >>>>> >>>>>> 2023年8月3日 11:24,Joel Fernandes (Google) <joel@joelfernandes.org> 写道: >>>>>> >>>>>> Add a detailed note to explain the potential side effects of >>>>>> plain-accessing the gp pointer using a plain load, without using the >>>>>> rcu_dereference() macros; which might trip neighboring code that does >>>>>> use rcu_dereference(). >>>>>> >>>>>> I haven't verified this with a compiler, but this is what I gather from >>>>>> the below link using Will's experience with READ_ONCE(). >>>>>> >>>>>> Link: https://lore.kernel.org/all/20230728124412.GA21303@willie-the-truck/ >>>>>> Cc: Will Deacon <will@kernel.org> >>>>>> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> >>>>>> --- >>>>>> .../RCU/Design/Requirements/Requirements.rst | 32 +++++++++++++++++++ >>>>>> 1 file changed, 32 insertions(+) >>>>>> >>>>>> diff --git a/Documentation/RCU/Design/Requirements/Requirements.rst b/Documentation/RCU/Design/Requirements/Requirements.rst >>>>>> index f3b605285a87..e0b896d3fb9b 100644 >>>>>> --- a/Documentation/RCU/Design/Requirements/Requirements.rst >>>>>> +++ b/Documentation/RCU/Design/Requirements/Requirements.rst >>>>>> @@ -376,6 +376,38 @@ mechanism, most commonly locking or reference counting >>>>>> .. |high-quality implementation of C11 memory_order_consume [PDF]| replace:: high-quality implementation of C11 ``memory_order_consume`` [PDF] >>>>>> .. _high-quality implementation of C11 memory_order_consume [PDF]: http://www.rdrop.com/users/paulmck/RCU/consume.2015.07.13a.pdf >>>>>> >>>>>> +Note that, there can be strange side effects (due to compiler optimizations) if >>>>>> +``gp`` is ever accessed using a plain load (i.e. without ``READ_ONCE()`` or >>>>>> +``rcu_dereference()``) potentially hurting any succeeding >>>>>> +``rcu_dereference()``. For example, consider the code: >>>>>> + >>>>>> + :: >>>>>> + >>>>>> + 1 bool do_something_gp(void) >>>>>> + 2 { >>>>>> + 3 void *tmp; >>>>>> + 4 rcu_read_lock(); >>>>>> + 5 tmp = gp; // Plain-load of GP. >>>>>> + 6 printk("Point gp = %p\n", tmp); >>>>>> + 7 >>>>>> + 8 p = rcu_dereference(gp); >>>>>> + 9 if (p) { >>>>>> + 10 do_something(p->a, p->b); >>>>>> + 11 rcu_read_unlock(); >>>>>> + 12 return true; >>>>>> + 13 } >>>>>> + 14 rcu_read_unlock(); >>>>>> + 15 return false; >>>>>> + 16 } >>>>>> + >>>>>> +The behavior of plain accesses involved in a data race is non-deterministic in >>>>>> +the face of compiler optimizations. Since accesses to the ``gp`` pointer is >>>>>> +by-design a data race, the compiler could trip this code by caching the value >>>>>> +of ``gp`` into a register in line 5, and then using the value of the register >>>>>> +to satisfy the load in line 10. Thus it is important to never mix >>>>> >>>>> Will’s example is: >>>>> >>>>> // Assume *ptr is initially 0 and somebody else writes it to 1 >>>>> // concurrently >>>>> >>>>> foo = *ptr; >>>>> bar = READ_ONCE(*ptr); >>>>> baz = *ptr; >>>>> >>>>> Then the compiler is within its right to reorder it to: >>>>> >>>>> foo = *ptr; >>>>> baz = *ptr; >>>>> bar = READ_ONCE(*ptr); >>>>> >>>>> So, the result foo == baz == 0 but bar == 1 is perfectly legal. >>>> >>>> Yes, a bad outcome is perfectly legal amidst data race. Who said it is not legal? >>> >>> My understanding is that it is legal even without data race, and the compiler only keeps the order of volatile access. >> >> Yes, but I can bet on it the author of the code would not have >> intended such an outcome, if they did then Will wouldn't have been >> debugging it ;-). That's why I called it a bad outcome. The goal of >> this patch is to document such a possible unintentional outcome. Please trim replies if possible. >> >>>>> But the example here is different, >>>> >>>> That is intentional. Wills discussion partially triggered this. Though I am wondering >>>> if we should document that as well. >>>> >>>>> the compiler can not use the value loaded from line 5 >>>>> unless the compiler can deduce that the tmp is equals to p in which case the address dependency >>>>> doesn’t exist anymore. >>>>> >>>>> What am I missing here? >>>> >>>> Maybe you are trying to rationalize too much that the sequence mentioned cannot result >>>> in a counter intuitive outcome like I did? >>>> >>>> The point AFAIU is not just about line 10 but that the compiler can replace any of the >>>> lines after the plain access with the cached value. >>> >>> Well, IIUC, according to the C standard, the compiler can do anything if there is a data race (undefined behavior). >>> >>> However, what if a write is not protected with WRITE_ONCE and the read is marked with READ_ONCE? >>> That’s also a data race, right? But the kernel considers it is Okay if the write is machine word aligned. >> >> Yes, but there is a compiler between the HLL code and what the >> processor sees which can tear the write. How can not using >> WRITE_ONCE() prevent store-tearing? See [1]. My understanding is that >> it is OK only if the reader did a NULL check. In that case the torn > > Yes, a write-write data race where the value is the same is also fine. > > But they are still data race, if the compiler is within its right to do anything it likes (due to data race), > we still need WRITE_ONCE() in these cases, though it’s semantically safe. > > IIUC, even with _ONCE(), the compiler is within its right do anything according to the standard (at least before the upcoming C23), because the standard doesn’t consider a volatile access to be atomic. > > However, the kernel consider the volatile access to be atomic, right? > > BTW, line 5 in the example is likely to be optimized away. And yes, the compiler can cache the value loaded from line 5 from the perspective of undefined behavior, even if I believe it would be a compiler bug from the perspective of kernel. I am actually a bit lost with what you are trying to say. Are you saying that mixing plain accesses with marked accesses is an acceptable practice? I would like others to weight in as well since I am not seeing what Alan is suggesting. AFAICS, in the absence of barrier(), any optimization caused by plain access makes it a bad practice to mix it. Thanks, - Joel > >> result will not change the semantics of the program. But otherwise, >> that's bad. >> >> [1] https://lwn.net/Articles/793253/#Store%20Tearing >> >> thanks, >> >> - Joel >> >> >>> >>>> >>>> Thanks. >>>> >>>> >>>> >>>>> >>>>>> +plain accesses of a memory location with rcu_dereference() of the same memory >>>>>> +location, in code involved in a data race. >>>>>> + >>>>>> In short, updaters use rcu_assign_pointer() and readers use >>>>>> rcu_dereference(), and these two RCU API elements work together to >>>>>> ensure that readers have a consistent view of newly added data elements. >>>>>> -- >>>>>> 2.41.0.585.gd2178a4bd4-goog > >
On Fri, Aug 04, 2023 at 03:25:57AM +0800, Alan Huang wrote: > > 2023年8月4日 00:01,Joel Fernandes <joel@joelfernandes.org> 写道: > > On Thu, Aug 3, 2023 at 9:36 AM Alan Huang <mmpgouride@gmail.com> wrote: > >>> 2023年8月3日 下午8:35,Joel Fernandes <joel@joelfernandes.org> 写道: > >>>> On Aug 3, 2023, at 8:09 AM, Alan Huang <mmpgouride@gmail.com> wrote: > >>>>> 2023年8月3日 11:24,Joel Fernandes (Google) <joel@joelfernandes.org> 写道: > >>>>> Add a detailed note to explain the potential side effects of > >>>>> plain-accessing the gp pointer using a plain load, without using the > >>>>> rcu_dereference() macros; which might trip neighboring code that does > >>>>> use rcu_dereference(). > >>>>> > >>>>> I haven't verified this with a compiler, but this is what I gather from > >>>>> the below link using Will's experience with READ_ONCE(). > >>>>> > >>>>> Link: https://lore.kernel.org/all/20230728124412.GA21303@willie-the-truck/ > >>>>> Cc: Will Deacon <will@kernel.org> > >>>>> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > >>>>> --- > >>>>> .../RCU/Design/Requirements/Requirements.rst | 32 +++++++++++++++++++ > >>>>> 1 file changed, 32 insertions(+) > >>>>> > >>>>> diff --git a/Documentation/RCU/Design/Requirements/Requirements.rst b/Documentation/RCU/Design/Requirements/Requirements.rst > >>>>> index f3b605285a87..e0b896d3fb9b 100644 > >>>>> --- a/Documentation/RCU/Design/Requirements/Requirements.rst > >>>>> +++ b/Documentation/RCU/Design/Requirements/Requirements.rst > >>>>> @@ -376,6 +376,38 @@ mechanism, most commonly locking or reference counting > >>>>> .. |high-quality implementation of C11 memory_order_consume [PDF]| replace:: high-quality implementation of C11 ``memory_order_consume`` [PDF] > >>>>> .. _high-quality implementation of C11 memory_order_consume [PDF]: http://www.rdrop.com/users/paulmck/RCU/consume.2015.07.13a.pdf > >>>>> > >>>>> +Note that, there can be strange side effects (due to compiler optimizations) if > >>>>> +``gp`` is ever accessed using a plain load (i.e. without ``READ_ONCE()`` or > >>>>> +``rcu_dereference()``) potentially hurting any succeeding > >>>>> +``rcu_dereference()``. For example, consider the code: > >>>>> + > >>>>> + :: > >>>>> + > >>>>> + 1 bool do_something_gp(void) > >>>>> + 2 { > >>>>> + 3 void *tmp; > >>>>> + 4 rcu_read_lock(); > >>>>> + 5 tmp = gp; // Plain-load of GP. > >>>>> + 6 printk("Point gp = %p\n", tmp); > >>>>> + 7 > >>>>> + 8 p = rcu_dereference(gp); > >>>>> + 9 if (p) { > >>>>> + 10 do_something(p->a, p->b); > >>>>> + 11 rcu_read_unlock(); > >>>>> + 12 return true; > >>>>> + 13 } > >>>>> + 14 rcu_read_unlock(); > >>>>> + 15 return false; > >>>>> + 16 } > >>>>> + > >>>>> +The behavior of plain accesses involved in a data race is non-deterministic in > >>>>> +the face of compiler optimizations. Since accesses to the ``gp`` pointer is > >>>>> +by-design a data race, the compiler could trip this code by caching the value > >>>>> +of ``gp`` into a register in line 5, and then using the value of the register > >>>>> +to satisfy the load in line 10. Thus it is important to never mix > >>>> > >>>> Will’s example is: > >>>> > >>>> // Assume *ptr is initially 0 and somebody else writes it to 1 > >>>> // concurrently > >>>> > >>>> foo = *ptr; > >>>> bar = READ_ONCE(*ptr); > >>>> baz = *ptr; > >>>> > >>>> Then the compiler is within its right to reorder it to: > >>>> > >>>> foo = *ptr; > >>>> baz = *ptr; > >>>> bar = READ_ONCE(*ptr); > >>>> > >>>> So, the result foo == baz == 0 but bar == 1 is perfectly legal. > >>> > >>> Yes, a bad outcome is perfectly legal amidst data race. Who said it is not legal? > >> > >> My understanding is that it is legal even without data race, and the compiler only keeps the order of volatile access. > > > > Yes, but I can bet on it the author of the code would not have > > intended such an outcome, if they did then Will wouldn't have been > > debugging it ;-). That's why I called it a bad outcome. The goal of > > this patch is to document such a possible unintentional outcome. > > > >>>> But the example here is different, > >>> > >>> That is intentional. Wills discussion partially triggered this. Though I am wondering > >>> if we should document that as well. > >>> > >>>> the compiler can not use the value loaded from line 5 > >>>> unless the compiler can deduce that the tmp is equals to p in which case the address dependency > >>>> doesn’t exist anymore. > >>>> > >>>> What am I missing here? > >>> > >>> Maybe you are trying to rationalize too much that the sequence mentioned cannot result > >>> in a counter intuitive outcome like I did? > >>> > >>> The point AFAIU is not just about line 10 but that the compiler can replace any of the > >>> lines after the plain access with the cached value. > >> > >> Well, IIUC, according to the C standard, the compiler can do anything if there is a data race (undefined behavior). > >> > >> However, what if a write is not protected with WRITE_ONCE and the read is marked with READ_ONCE? > >> That’s also a data race, right? But the kernel considers it is Okay if the write is machine word aligned. > > > > Yes, but there is a compiler between the HLL code and what the > > processor sees which can tear the write. How can not using > > WRITE_ONCE() prevent store-tearing? See [1]. My understanding is that > > it is OK only if the reader did a NULL check. In that case the torn > > Yes, a write-write data race where the value is the same is also fine. > > But they are still data race, if the compiler is within its right to do anything it likes (due to data race), > we still need WRITE_ONCE() in these cases, though it’s semantically safe. > > IIUC, even with _ONCE(), the compiler is within its right do anything according to the standard (at least before the upcoming C23), because the standard doesn’t consider a volatile access to be atomic. Volatile accesses are not specified very well in the standard. However, as a practical matter, compilers that wish to be able to device drivers (whether in kernels or userspace applications) must compile those volatile accesses in such a way to allow reliable device drivers to be written. > However, the kernel consider the volatile access to be atomic, right? The compiler must therefore act as if a volatile access to an aligned machine-word size location is atomic. To see this, consider accesses to memory that is shared by a device driver and that device's firmware, both of which are written in either C or C++. Does that help? Thanx, Paul > BTW, line 5 in the example is likely to be optimized away. And yes, the compiler can cache the value loaded from line 5 from the perspective of undefined behavior, even if I believe it would be a compiler bug from the perspective of kernel. > > > result will not change the semantics of the program. But otherwise, > > that's bad. > > > > [1] https://lwn.net/Articles/793253/#Store%20Tearing > > > > thanks, > > > > - Joel > > > > > >> > >>> > >>> Thanks. > >>> > >>> > >>> > >>>> > >>>>> +plain accesses of a memory location with rcu_dereference() of the same memory > >>>>> +location, in code involved in a data race. > >>>>> + > >>>>> In short, updaters use rcu_assign_pointer() and readers use > >>>>> rcu_dereference(), and these two RCU API elements work together to > >>>>> ensure that readers have a consistent view of newly added data elements. > >>>>> -- > >>>>> 2.41.0.585.gd2178a4bd4-goog > >
> On Aug 3, 2023, at 8:01 PM, Paul E. McKenney <paulmck@kernel.org> wrote: > > On Fri, Aug 04, 2023 at 03:25:57AM +0800, Alan Huang wrote: >>> 2023年8月4日 00:01,Joel Fernandes <joel@joelfernandes.org> 写道: >>> On Thu, Aug 3, 2023 at 9:36 AM Alan Huang <mmpgouride@gmail.com> wrote: >>>>> 2023年8月3日 下午8:35,Joel Fernandes <joel@joelfernandes.org> 写道: >>>>>>> On Aug 3, 2023, at 8:09 AM, Alan Huang <mmpgouride@gmail.com> wrote: >>>>>>>> 2023年8月3日 11:24,Joel Fernandes (Google) <joel@joelfernandes.org> 写道: >>>>>>>> Add a detailed note to explain the potential side effects of >>>>>>>> plain-accessing the gp pointer using a plain load, without using the >>>>>>>> rcu_dereference() macros; which might trip neighboring code that does >>>>>>>> use rcu_dereference(). >>>>>>>> >>>>>>>> I haven't verified this with a compiler, but this is what I gather from >>>>>>>> the below link using Will's experience with READ_ONCE(). >>>>>>>> >>>>>>>> Link: https://lore.kernel.org/all/20230728124412.GA21303@willie-the-truck/ >>>>>>>> Cc: Will Deacon <will@kernel.org> >>>>>>>> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> >>>>>>>> --- >>>>>>>> .../RCU/Design/Requirements/Requirements.rst | 32 +++++++++++++++++++ >>>>>>>> 1 file changed, 32 insertions(+) >>>>>>>> >>>>>>>> diff --git a/Documentation/RCU/Design/Requirements/Requirements.rst b/Documentation/RCU/Design/Requirements/Requirements.rst >>>>>>>> index f3b605285a87..e0b896d3fb9b 100644 >>>>>>>> --- a/Documentation/RCU/Design/Requirements/Requirements.rst >>>>>>>> +++ b/Documentation/RCU/Design/Requirements/Requirements.rst >>>>>>>> @@ -376,6 +376,38 @@ mechanism, most commonly locking or reference counting >>>>>>>> .. |high-quality implementation of C11 memory_order_consume [PDF]| replace:: high-quality implementation of C11 ``memory_order_consume`` [PDF] >>>>>>>> .. _high-quality implementation of C11 memory_order_consume [PDF]: http://www.rdrop.com/users/paulmck/RCU/consume.2015.07.13a.pdf >>>>>>>> >>>>>>>> +Note that, there can be strange side effects (due to compiler optimizations) if >>>>>>>> +``gp`` is ever accessed using a plain load (i.e. without ``READ_ONCE()`` or >>>>>>>> +``rcu_dereference()``) potentially hurting any succeeding >>>>>>>> +``rcu_dereference()``. For example, consider the code: >>>>>>>> + >>>>>>>> + :: >>>>>>>> + >>>>>>>> + 1 bool do_something_gp(void) >>>>>>>> + 2 { >>>>>>>> + 3 void *tmp; >>>>>>>> + 4 rcu_read_lock(); >>>>>>>> + 5 tmp = gp; // Plain-load of GP. >>>>>>>> + 6 printk("Point gp = %p\n", tmp); >>>>>>>> + 7 >>>>>>>> + 8 p = rcu_dereference(gp); >>>>>>>> + 9 if (p) { >>>>>>>> + 10 do_something(p->a, p->b); >>>>>>>> + 11 rcu_read_unlock(); >>>>>>>> + 12 return true; >>>>>>>> + 13 } >>>>>>>> + 14 rcu_read_unlock(); >>>>>>>> + 15 return false; >>>>>>>> + 16 } >>>>>>>> + >>>>>>>> +The behavior of plain accesses involved in a data race is non-deterministic in >>>>>>>> +the face of compiler optimizations. Since accesses to the ``gp`` pointer is >>>>>>>> +by-design a data race, the compiler could trip this code by caching the value >>>>>>>> +of ``gp`` into a register in line 5, and then using the value of the register >>>>>>>> +to satisfy the load in line 10. Thus it is important to never mix >>>>>>> >>>>>>> Will’s example is: >>>>>>> >>>>>>> // Assume *ptr is initially 0 and somebody else writes it to 1 >>>>>>> // concurrently >>>>>>> >>>>>>> foo = *ptr; >>>>>>> bar = READ_ONCE(*ptr); >>>>>>> baz = *ptr; >>>>>>> >>>>>>> Then the compiler is within its right to reorder it to: >>>>>>> >>>>>>> foo = *ptr; >>>>>>> baz = *ptr; >>>>>>> bar = READ_ONCE(*ptr); >>>>>>> >>>>>>> So, the result foo == baz == 0 but bar == 1 is perfectly legal. >>>>>> >>>>>> Yes, a bad outcome is perfectly legal amidst data race. Who said it is not legal? >>>>> >>>>> My understanding is that it is legal even without data race, and the compiler only keeps the order of volatile access. >>> >>> Yes, but I can bet on it the author of the code would not have >>> intended such an outcome, if they did then Will wouldn't have been >>> debugging it ;-). That's why I called it a bad outcome. The goal of >>> this patch is to document such a possible unintentional outcome. >>> >>>>>> But the example here is different, >>>>> >>>>> That is intentional. Wills discussion partially triggered this. Though I am wondering >>>>> if we should document that as well. >>>>> >>>>>> the compiler can not use the value loaded from line 5 >>>>>> unless the compiler can deduce that the tmp is equals to p in which case the address dependency >>>>>> doesn’t exist anymore. >>>>>> >>>>>> What am I missing here? >>>>> >>>>> Maybe you are trying to rationalize too much that the sequence mentioned cannot result >>>>> in a counter intuitive outcome like I did? >>>>> >>>>> The point AFAIU is not just about line 10 but that the compiler can replace any of the >>>>> lines after the plain access with the cached value. >>>> >>>> Well, IIUC, according to the C standard, the compiler can do anything if there is a data race (undefined behavior). >>>> >>>> However, what if a write is not protected with WRITE_ONCE and the read is marked with READ_ONCE? >>>> That’s also a data race, right? But the kernel considers it is Okay if the write is machine word aligned. >>> >>> Yes, but there is a compiler between the HLL code and what the >>> processor sees which can tear the write. How can not using >>> WRITE_ONCE() prevent store-tearing? See [1]. My understanding is that >>> it is OK only if the reader did a NULL check. In that case the torn >> >> Yes, a write-write data race where the value is the same is also fine. >> >> But they are still data race, if the compiler is within its right to do anything it likes (due to data race), >> we still need WRITE_ONCE() in these cases, though it’s semantically safe. >> >> IIUC, even with _ONCE(), the compiler is within its right do anything according to the standard (at least before the upcoming C23), because the standard doesn’t consider a volatile access to be atomic. > > Volatile accesses are not specified very well in the standard. However, > as a practical matter, compilers that wish to be able to device drivers > (whether in kernels or userspace applications) must compile those volatile > accesses in such a way to allow reliable device drivers to be written. Agreed. > >> However, the kernel consider the volatile access to be atomic, right? > > The compiler must therefore act as if a volatile access to an aligned > machine-word size location is atomic. To see this, consider accesses > to memory that is shared by a device driver and that device's firmware, > both of which are written in either C or C++. Btw it appears TSAN complaints bitterly on even volatile 4 byte data races. Hence we have to explicitly use atomic API for data race accesses in Chrome. Thanks, Joel > Does that help? > > Thanx, Paul > >> BTW, line 5 in the example is likely to be optimized away. And yes, the compiler can cache the value loaded from line 5 from the perspective of undefined behavior, even if I believe it would be a compiler bug from the perspective of kernel. >> >>> result will not change the semantics of the program. But otherwise, >>> that's bad. >>> >>> [1] https://lwn.net/Articles/793253/#Store%20Tearing >>> >>> thanks, >>> >>> - Joel >>> >>> >>>> >>>>> >>>>> Thanks. >>>>> >>>>> >>>>> >>>>>> >>>>>>> +plain accesses of a memory location with rcu_dereference() of the same memory >>>>>>> +location, in code involved in a data race. >>>>>>> + >>>>>>> In short, updaters use rcu_assign_pointer() and readers use >>>>>>> rcu_dereference(), and these two RCU API elements work together to >>>>>>> ensure that readers have a consistent view of newly added data elements. >>>>>>> -- >>>>>>> 2.41.0.585.gd2178a4bd4-goog >> >>
On Fri, Aug 04, 2023 at 08:33:55AM -0400, Joel Fernandes wrote: > > On Aug 3, 2023, at 8:01 PM, Paul E. McKenney <paulmck@kernel.org> wrote: > > On Fri, Aug 04, 2023 at 03:25:57AM +0800, Alan Huang wrote: > >>> 2023年8月4日 00:01,Joel Fernandes <joel@joelfernandes.org> 写道: > >>> On Thu, Aug 3, 2023 at 9:36 AM Alan Huang <mmpgouride@gmail.com> wrote: > >>>>> 2023年8月3日 下午8:35,Joel Fernandes <joel@joelfernandes.org> 写道: > >>>>>>> On Aug 3, 2023, at 8:09 AM, Alan Huang <mmpgouride@gmail.com> wrote: > >>>>>>>> 2023年8月3日 11:24,Joel Fernandes (Google) <joel@joelfernandes.org> 写道: > >>>>>>>> Add a detailed note to explain the potential side effects of > >>>>>>>> plain-accessing the gp pointer using a plain load, without using the > >>>>>>>> rcu_dereference() macros; which might trip neighboring code that does > >>>>>>>> use rcu_dereference(). > >>>>>>>> > >>>>>>>> I haven't verified this with a compiler, but this is what I gather from > >>>>>>>> the below link using Will's experience with READ_ONCE(). > >>>>>>>> > >>>>>>>> Link: https://lore.kernel.org/all/20230728124412.GA21303@willie-the-truck/ > >>>>>>>> Cc: Will Deacon <will@kernel.org> > >>>>>>>> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > >>>>>>>> --- > >>>>>>>> .../RCU/Design/Requirements/Requirements.rst | 32 +++++++++++++++++++ > >>>>>>>> 1 file changed, 32 insertions(+) > >>>>>>>> > >>>>>>>> diff --git a/Documentation/RCU/Design/Requirements/Requirements.rst b/Documentation/RCU/Design/Requirements/Requirements.rst > >>>>>>>> index f3b605285a87..e0b896d3fb9b 100644 > >>>>>>>> --- a/Documentation/RCU/Design/Requirements/Requirements.rst > >>>>>>>> +++ b/Documentation/RCU/Design/Requirements/Requirements.rst > >>>>>>>> @@ -376,6 +376,38 @@ mechanism, most commonly locking or reference counting > >>>>>>>> .. |high-quality implementation of C11 memory_order_consume [PDF]| replace:: high-quality implementation of C11 ``memory_order_consume`` [PDF] > >>>>>>>> .. _high-quality implementation of C11 memory_order_consume [PDF]: http://www.rdrop.com/users/paulmck/RCU/consume.2015.07.13a.pdf > >>>>>>>> > >>>>>>>> +Note that, there can be strange side effects (due to compiler optimizations) if > >>>>>>>> +``gp`` is ever accessed using a plain load (i.e. without ``READ_ONCE()`` or > >>>>>>>> +``rcu_dereference()``) potentially hurting any succeeding > >>>>>>>> +``rcu_dereference()``. For example, consider the code: > >>>>>>>> + > >>>>>>>> + :: > >>>>>>>> + > >>>>>>>> + 1 bool do_something_gp(void) > >>>>>>>> + 2 { > >>>>>>>> + 3 void *tmp; > >>>>>>>> + 4 rcu_read_lock(); > >>>>>>>> + 5 tmp = gp; // Plain-load of GP. > >>>>>>>> + 6 printk("Point gp = %p\n", tmp); > >>>>>>>> + 7 > >>>>>>>> + 8 p = rcu_dereference(gp); > >>>>>>>> + 9 if (p) { > >>>>>>>> + 10 do_something(p->a, p->b); > >>>>>>>> + 11 rcu_read_unlock(); > >>>>>>>> + 12 return true; > >>>>>>>> + 13 } > >>>>>>>> + 14 rcu_read_unlock(); > >>>>>>>> + 15 return false; > >>>>>>>> + 16 } > >>>>>>>> + > >>>>>>>> +The behavior of plain accesses involved in a data race is non-deterministic in > >>>>>>>> +the face of compiler optimizations. Since accesses to the ``gp`` pointer is > >>>>>>>> +by-design a data race, the compiler could trip this code by caching the value > >>>>>>>> +of ``gp`` into a register in line 5, and then using the value of the register > >>>>>>>> +to satisfy the load in line 10. Thus it is important to never mix > >>>>>>> > >>>>>>> Will’s example is: > >>>>>>> > >>>>>>> // Assume *ptr is initially 0 and somebody else writes it to 1 > >>>>>>> // concurrently > >>>>>>> > >>>>>>> foo = *ptr; > >>>>>>> bar = READ_ONCE(*ptr); > >>>>>>> baz = *ptr; > >>>>>>> > >>>>>>> Then the compiler is within its right to reorder it to: > >>>>>>> > >>>>>>> foo = *ptr; > >>>>>>> baz = *ptr; > >>>>>>> bar = READ_ONCE(*ptr); > >>>>>>> > >>>>>>> So, the result foo == baz == 0 but bar == 1 is perfectly legal. > >>>>>> > >>>>>> Yes, a bad outcome is perfectly legal amidst data race. Who said it is not legal? > >>>>> > >>>>> My understanding is that it is legal even without data race, and the compiler only keeps the order of volatile access. > >>> > >>> Yes, but I can bet on it the author of the code would not have > >>> intended such an outcome, if they did then Will wouldn't have been > >>> debugging it ;-). That's why I called it a bad outcome. The goal of > >>> this patch is to document such a possible unintentional outcome. > >>> > >>>>>> But the example here is different, > >>>>> > >>>>> That is intentional. Wills discussion partially triggered this. Though I am wondering > >>>>> if we should document that as well. > >>>>> > >>>>>> the compiler can not use the value loaded from line 5 > >>>>>> unless the compiler can deduce that the tmp is equals to p in which case the address dependency > >>>>>> doesn’t exist anymore. > >>>>>> > >>>>>> What am I missing here? > >>>>> > >>>>> Maybe you are trying to rationalize too much that the sequence mentioned cannot result > >>>>> in a counter intuitive outcome like I did? > >>>>> > >>>>> The point AFAIU is not just about line 10 but that the compiler can replace any of the > >>>>> lines after the plain access with the cached value. > >>>> > >>>> Well, IIUC, according to the C standard, the compiler can do anything if there is a data race (undefined behavior). > >>>> > >>>> However, what if a write is not protected with WRITE_ONCE and the read is marked with READ_ONCE? > >>>> That’s also a data race, right? But the kernel considers it is Okay if the write is machine word aligned. > >>> > >>> Yes, but there is a compiler between the HLL code and what the > >>> processor sees which can tear the write. How can not using > >>> WRITE_ONCE() prevent store-tearing? See [1]. My understanding is that > >>> it is OK only if the reader did a NULL check. In that case the torn > >> > >> Yes, a write-write data race where the value is the same is also fine. > >> > >> But they are still data race, if the compiler is within its right to do anything it likes (due to data race), > >> we still need WRITE_ONCE() in these cases, though it’s semantically safe. > >> > >> IIUC, even with _ONCE(), the compiler is within its right do anything according to the standard (at least before the upcoming C23), because the standard doesn’t consider a volatile access to be atomic. > > > > Volatile accesses are not specified very well in the standard. However, > > as a practical matter, compilers that wish to be able to device drivers > > (whether in kernels or userspace applications) must compile those volatile > > accesses in such a way to allow reliable device drivers to be written. > > Agreed. > > > > >> However, the kernel consider the volatile access to be atomic, right? > > > > The compiler must therefore act as if a volatile access to an aligned > > machine-word size location is atomic. To see this, consider accesses > > to memory that is shared by a device driver and that device's firmware, > > both of which are written in either C or C++. > > Btw it appears TSAN complaints bitterly on even volatile 4 byte data races. > Hence we have to explicitly use atomic API for data race accesses in Chrome. That might have been a conscious and deliberate choice on the part of the TSAN guys. Volatile does not imply any ordering in the standard (other than the compiler avoiding emitting volatile operations out of order), but some compilers did emit memory-barrier instructions for volatile accesses. Which resulted in a lot of problems when such code found compilers that did not cause the CPU to order volatile operations. So a lot of people decided to thrown the volatile baby out with the unordered bathwather. ;-) Thanx, Paul > Thanks, > Joel > > > > > Does that help? > > > > Thanx, Paul > > > >> BTW, line 5 in the example is likely to be optimized away. And yes, the compiler can cache the value loaded from line 5 from the perspective of undefined behavior, even if I believe it would be a compiler bug from the perspective of kernel. > >> > >>> result will not change the semantics of the program. But otherwise, > >>> that's bad. > >>> > >>> [1] https://lwn.net/Articles/793253/#Store%20Tearing > >>> > >>> thanks, > >>> > >>> - Joel > >>> > >>> > >>>> > >>>>> > >>>>> Thanks. > >>>>> > >>>>> > >>>>> > >>>>>> > >>>>>>> +plain accesses of a memory location with rcu_dereference() of the same memory > >>>>>>> +location, in code involved in a data race. > >>>>>>> + > >>>>>>> In short, updaters use rcu_assign_pointer() and readers use > >>>>>>> rcu_dereference(), and these two RCU API elements work together to > >>>>>>> ensure that readers have a consistent view of newly added data elements. > >>>>>>> -- > >>>>>>> 2.41.0.585.gd2178a4bd4-goog > >> > >>
> >>> >>>>>> But the example here is different, >>>>> >>>>> That is intentional. Wills discussion partially triggered this. Though I am wondering >>>>> if we should document that as well. >>>>> >>>>>> the compiler can not use the value loaded from line 5 >>>>>> unless the compiler can deduce that the tmp is equals to p in which case the address dependency >>>>>> doesn’t exist anymore. >>>>>> >>>>>> What am I missing here? >>>>> >>>>> Maybe you are trying to rationalize too much that the sequence mentioned cannot result >>>>> in a counter intuitive outcome like I did? >>>>> >>>>> The point AFAIU is not just about line 10 but that the compiler can replace any of the >>>>> lines after the plain access with the cached value. >>>> >>>> Well, IIUC, according to the C standard, the compiler can do anything if there is a data race (undefined behavior). >>>> >>>> However, what if a write is not protected with WRITE_ONCE and the read is marked with READ_ONCE? >>>> That’s also a data race, right? But the kernel considers it is Okay if the write is machine word aligned. >>> >>> Yes, but there is a compiler between the HLL code and what the >>> processor sees which can tear the write. How can not using >>> WRITE_ONCE() prevent store-tearing? See [1]. My understanding is that >>> it is OK only if the reader did a NULL check. In that case the torn >> >> Yes, a write-write data race where the value is the same is also fine. >> >> But they are still data race, if the compiler is within its right to do anything it likes (due to data race), >> we still need WRITE_ONCE() in these cases, though it’s semantically safe. >> >> IIUC, even with _ONCE(), the compiler is within its right do anything according to the standard (at least before the upcoming C23), because the standard doesn’t consider a volatile access to be atomic. >> >> However, the kernel consider the volatile access to be atomic, right? >> >> BTW, line 5 in the example is likely to be optimized away. And yes, the compiler can cache the value loaded from line 5 from the perspective of undefined behavior, even if I believe it would be a compiler bug from the perspective of kernel. > > I am actually a bit lost with what you are trying to say. Are you saying that mixing > plain accesses with marked accesses is an acceptable practice? I’m trying to say that sometimes data race is fine, that’s why we have the data_race(). Even if the standard says data race results in UB. And IMHO, the possible data race at line 5 in this example is also fine, unless the compiler deduces that the value of gp is always the same. > > I would like others to weight in as well since I am not seeing what Alan is suggesting. > AFAICS, in the absence of barrier(), any optimization caused by plain access > makes it a bad practice to mix it. > > Thanks, > > - Joel > > > >> >>> result will not change the semantics of the program. But otherwise, >>> that's bad. >>> >>> [1] https://lwn.net/Articles/793253/#Store%20Tearing >>> >>> thanks, >>> >>> - Joel >>> >>> >>>> >>>>> >>>>> Thanks. >>>>> >>>>> >>>>> >>>>>> >>>>>>> +plain accesses of a memory location with rcu_dereference() of the same memory >>>>>>> +location, in code involved in a data race. >>>>>>> + >>>>>>> In short, updaters use rcu_assign_pointer() and readers use >>>>>>> rcu_dereference(), and these two RCU API elements work together to >>>>>>> ensure that readers have a consistent view of newly added data elements. >>>>>>> -- >>>>>>> 2.41.0.585.gd2178a4bd4-goog
On Fri, Aug 4, 2023 at 11:47 AM Alan Huang <mmpgouride@gmail.com> wrote: > > > > >>> > >>>>>> But the example here is different, > >>>>> > >>>>> That is intentional. Wills discussion partially triggered this. Though I am wondering > >>>>> if we should document that as well. > >>>>> > >>>>>> the compiler can not use the value loaded from line 5 > >>>>>> unless the compiler can deduce that the tmp is equals to p in which case the address dependency > >>>>>> doesn’t exist anymore. > >>>>>> > >>>>>> What am I missing here? > >>>>> > >>>>> Maybe you are trying to rationalize too much that the sequence mentioned cannot result > >>>>> in a counter intuitive outcome like I did? > >>>>> > >>>>> The point AFAIU is not just about line 10 but that the compiler can replace any of the > >>>>> lines after the plain access with the cached value. > >>>> > >>>> Well, IIUC, according to the C standard, the compiler can do anything if there is a data race (undefined behavior). > >>>> > >>>> However, what if a write is not protected with WRITE_ONCE and the read is marked with READ_ONCE? > >>>> That’s also a data race, right? But the kernel considers it is Okay if the write is machine word aligned. > >>> > >>> Yes, but there is a compiler between the HLL code and what the > >>> processor sees which can tear the write. How can not using > >>> WRITE_ONCE() prevent store-tearing? See [1]. My understanding is that > >>> it is OK only if the reader did a NULL check. In that case the torn > >> > >> Yes, a write-write data race where the value is the same is also fine. > >> > >> But they are still data race, if the compiler is within its right to do anything it likes (due to data race), > >> we still need WRITE_ONCE() in these cases, though it’s semantically safe. > >> > >> IIUC, even with _ONCE(), the compiler is within its right do anything according to the standard (at least before the upcoming C23), because the standard doesn’t consider a volatile access to be atomic. > >> > >> However, the kernel consider the volatile access to be atomic, right? > >> > >> BTW, line 5 in the example is likely to be optimized away. And yes, the compiler can cache the value loaded from line 5 from the perspective of undefined behavior, even if I believe it would be a compiler bug from the perspective of kernel. > > > > I am actually a bit lost with what you are trying to say. Are you saying that mixing > > plain accesses with marked accesses is an acceptable practice? > > > I’m trying to say that sometimes data race is fine, that’s why we have the data_race(). > > Even if the standard says data race results in UB. > > And IMHO, the possible data race at line 5 in this example is also fine, unless the compiler > deduces that the value of gp is always the same. IMHO, no one is saying it is not "fine". As in, such behavior is neither a compiler nor strictly a kernel bug. More a wtf that the programmer should know off (does not hurt to know). I will rest my case with AlanH pending any input from people who know more than me. If there is a better way to represent such matters in the docs, I am happy to make changes to this patch. Cheers, - Joel
[...] > > >> However, the kernel consider the volatile access to be atomic, right? > > > > > > The compiler must therefore act as if a volatile access to an aligned > > > machine-word size location is atomic. To see this, consider accesses > > > to memory that is shared by a device driver and that device's firmware, > > > both of which are written in either C or C++. > > > > Btw it appears TSAN complaints bitterly on even volatile 4 byte data races. > > Hence we have to explicitly use atomic API for data race accesses in Chrome. > > That might have been a conscious and deliberate choice on the part of > the TSAN guys. Volatile does not imply any ordering in the standard > (other than the compiler avoiding emitting volatile operations out of > order), but some compilers did emit memory-barrier instructions for > volatile accesses. Which resulted in a lot of problems when such code > found compilers that did not cause the CPU to order volatile operations. > > So a lot of people decided to thrown the volatile baby out with the > unordered bathwather. ;-) Thanks for the input, I think TSAN was indeed worried about memory-ordering even if relaxed ordering was intended. I think there is a way to tell TSAN to shut-up in such situations but in my last Chrome sprint, I just used the atomic API with relaxed ordering and called it a day. :-) - Joel
>> Yes, a write-write data race where the value is the same is also fine. >> >> But they are still data race, if the compiler is within its right to do anything it likes (due to data race), >> we still need WRITE_ONCE() in these cases, though it’s semantically safe. >> >> IIUC, even with _ONCE(), the compiler is within its right do anything according to the standard (at least before the upcoming C23), because the standard doesn’t consider a volatile access to be atomic. > > Volatile accesses are not specified very well in the standard. However, > as a practical matter, compilers that wish to be able to device drivers > (whether in kernels or userspace applications) must compile those volatile > accesses in such a way to allow reliable device drivers to be written. > >> However, the kernel consider the volatile access to be atomic, right? > > The compiler must therefore act as if a volatile access to an aligned > machine-word size location is atomic. To see this, consider accesses > to memory that is shared by a device driver and that device's firmware, > both of which are written in either C or C++. I learned these things a few months ago. But still thank you! The real problem is that there may be a data race at line 5, so Joel is correct that the compiler can cache the value loaded from line 5 according to the standard given that the standard says that a data race result in undefined behavior, so the compiler might be allowed to do anything. But from the perspective of the kernel, line 5 is likely a diagnostic read, so it’s fine without READ_ONCE() and the compiler is not allowed to cache the value. This situation is like the volatile access. Am I missing something? > > Does that help? > > Thanx, Paul > >> BTW, line 5 in the example is likely to be optimized away. And yes, the compiler can cache the value loaded from line 5 from the perspective of undefined behavior, even if I believe it would be a compiler bug from the perspective of kernel. >> >>> result will not change the semantics of the program. But otherwise, >>> that's bad. >>> >>> [1] https://lwn.net/Articles/793253/#Store%20Tearing >>> >>> thanks, >>> >>> - Joel >>> >>> >>>> >>>>> >>>>> Thanks. >>>>> >>>>> >>>>> >>>>>> >>>>>>> +plain accesses of a memory location with rcu_dereference() of the same memory >>>>>>> +location, in code involved in a data race. >>>>>>> + >>>>>>> In short, updaters use rcu_assign_pointer() and readers use >>>>>>> rcu_dereference(), and these two RCU API elements work together to >>>>>>> ensure that readers have a consistent view of newly added data elements. >>>>>>> -- >>>>>>> 2.41.0.585.gd2178a4bd4-goog
On Fri, Aug 04, 2023 at 12:17:58PM -0400, Joel Fernandes wrote: > [...] > > > >> However, the kernel consider the volatile access to be atomic, right? > > > > > > > > The compiler must therefore act as if a volatile access to an aligned > > > > machine-word size location is atomic. To see this, consider accesses > > > > to memory that is shared by a device driver and that device's firmware, > > > > both of which are written in either C or C++. > > > > > > Btw it appears TSAN complaints bitterly on even volatile 4 byte data races. > > > Hence we have to explicitly use atomic API for data race accesses in Chrome. > > > > That might have been a conscious and deliberate choice on the part of > > the TSAN guys. Volatile does not imply any ordering in the standard > > (other than the compiler avoiding emitting volatile operations out of > > order), but some compilers did emit memory-barrier instructions for > > volatile accesses. Which resulted in a lot of problems when such code > > found compilers that did not cause the CPU to order volatile operations. > > > > So a lot of people decided to thrown the volatile baby out with the > > unordered bathwather. ;-) > > Thanks for the input, I think TSAN was indeed worried about > memory-ordering even if relaxed ordering was intended. I think there > is a way to tell TSAN to shut-up in such situations but in my last > Chrome sprint, I just used the atomic API with relaxed ordering and > called it a day. :-) Fair enough! Note that the Linux kernel's version of TSAN, which is KCSAN, does interpret volatile accesses more or less as if they were relaxed atomics. So TSAN could change, but I don't have a dog in that fight. ;-) Thanx, Paul
On Sat, Aug 05, 2023 at 12:33:03AM +0800, Alan Huang wrote: > > >> Yes, a write-write data race where the value is the same is also fine. > >> > >> But they are still data race, if the compiler is within its right to do anything it likes (due to data race), > >> we still need WRITE_ONCE() in these cases, though it’s semantically safe. > >> > >> IIUC, even with _ONCE(), the compiler is within its right do anything according to the standard (at least before the upcoming C23), because the standard doesn’t consider a volatile access to be atomic. > > > > Volatile accesses are not specified very well in the standard. However, > > as a practical matter, compilers that wish to be able to device drivers > > (whether in kernels or userspace applications) must compile those volatile > > accesses in such a way to allow reliable device drivers to be written. > > > >> However, the kernel consider the volatile access to be atomic, right? > > > > The compiler must therefore act as if a volatile access to an aligned > > machine-word size location is atomic. To see this, consider accesses > > to memory that is shared by a device driver and that device's firmware, > > both of which are written in either C or C++. > > I learned these things a few months ago. But still thank you! > > The real problem is that there may be a data race at line 5, so Joel is correct that the compiler > can cache the value loaded from line 5 according to the standard given that the standard says that > a data race result in undefined behavior, so the compiler might be allowed to do anything. But from the > perspective of the kernel, line 5 is likely a diagnostic read, so it’s fine without READ_ONCE() and the > compiler is not allowed to cache the value. > > This situation is like the volatile access. > > Am I missing something? I think you have it right. The point is that we are sometimes more concerned about focusing KCSAN diagnostics on the core concurrent algorithm, and are willing to take the very low risk of messed-up diagnostic output in order to get simpler and better KCSAN diagnostics on the main algorithm. So in that case, we use data_race() on the diagnostics and other markings in the main algorithm. For example, suppose that we had a core algorithm that relied on strict locking. In that case, we want to use unmarked plain C-language accesses in the core algorithm, which will allow KCSAN to flag and accesses that are not protected by the lock. But it might be bad for the diagnostic code to acquire that lock, as this would suppress diagnostics in the case where the lock was held for too long a time period. Using data_race() in the diagnostic code addresses this situation. Thanx, Paul > > Does that help? > > > > Thanx, Paul > > > >> BTW, line 5 in the example is likely to be optimized away. And yes, the compiler can cache the value loaded from line 5 from the perspective of undefined behavior, even if I believe it would be a compiler bug from the perspective of kernel. > >> > >>> result will not change the semantics of the program. But otherwise, > >>> that's bad. > >>> > >>> [1] https://lwn.net/Articles/793253/#Store%20Tearing > >>> > >>> thanks, > >>> > >>> - Joel > >>> > >>> > >>>> > >>>>> > >>>>> Thanks. > >>>>> > >>>>> > >>>>> > >>>>>> > >>>>>>> +plain accesses of a memory location with rcu_dereference() of the same memory > >>>>>>> +location, in code involved in a data race. > >>>>>>> + > >>>>>>> In short, updaters use rcu_assign_pointer() and readers use > >>>>>>> rcu_dereference(), and these two RCU API elements work together to > >>>>>>> ensure that readers have a consistent view of newly added data elements. > >>>>>>> -- > >>>>>>> 2.41.0.585.gd2178a4bd4-goog > >
diff --git a/Documentation/RCU/Design/Requirements/Requirements.rst b/Documentation/RCU/Design/Requirements/Requirements.rst index f3b605285a87..e0b896d3fb9b 100644 --- a/Documentation/RCU/Design/Requirements/Requirements.rst +++ b/Documentation/RCU/Design/Requirements/Requirements.rst @@ -376,6 +376,38 @@ mechanism, most commonly locking or reference counting .. |high-quality implementation of C11 memory_order_consume [PDF]| replace:: high-quality implementation of C11 ``memory_order_consume`` [PDF] .. _high-quality implementation of C11 memory_order_consume [PDF]: http://www.rdrop.com/users/paulmck/RCU/consume.2015.07.13a.pdf +Note that, there can be strange side effects (due to compiler optimizations) if +``gp`` is ever accessed using a plain load (i.e. without ``READ_ONCE()`` or +``rcu_dereference()``) potentially hurting any succeeding +``rcu_dereference()``. For example, consider the code: + + :: + + 1 bool do_something_gp(void) + 2 { + 3 void *tmp; + 4 rcu_read_lock(); + 5 tmp = gp; // Plain-load of GP. + 6 printk("Point gp = %p\n", tmp); + 7 + 8 p = rcu_dereference(gp); + 9 if (p) { + 10 do_something(p->a, p->b); + 11 rcu_read_unlock(); + 12 return true; + 13 } + 14 rcu_read_unlock(); + 15 return false; + 16 } + +The behavior of plain accesses involved in a data race is non-deterministic in +the face of compiler optimizations. Since accesses to the ``gp`` pointer is +by-design a data race, the compiler could trip this code by caching the value +of ``gp`` into a register in line 5, and then using the value of the register +to satisfy the load in line 10. Thus it is important to never mix +plain accesses of a memory location with rcu_dereference() of the same memory +location, in code involved in a data race. + In short, updaters use rcu_assign_pointer() and readers use rcu_dereference(), and these two RCU API elements work together to ensure that readers have a consistent view of newly added data elements.