Message ID | 20221202125100.30146-1-jonas.oberhauser@huaweicloud.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp827323wrr; Fri, 2 Dec 2022 04:55:30 -0800 (PST) X-Google-Smtp-Source: AA0mqf7YdUKQPOFq3uR9LsJ+2QPX1hWz79QQjTGEv0v/Rto8DNbbNp6S8SBXTrQkTRg7U1LVkooc X-Received: by 2002:a17:906:2645:b0:781:d0c1:4434 with SMTP id i5-20020a170906264500b00781d0c14434mr59471469ejc.756.1669985730458; Fri, 02 Dec 2022 04:55:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669985730; cv=none; d=google.com; s=arc-20160816; b=mDVjac+tg5nVl2y8h1ttO3HxZKes+Ahosn/ApoW37gqSuYh0w5mZpxbz9W3dkyazT6 N7I8eOa437bAxKQmJqFQqXJRRWZOkIza1FtDikgMeU8S5ulfCsk4IG/DLOJg7sJDXhrK 3vuKriISDamtO0nnjfMQ25cz/uXLRSVe+uKR3gXPmpwdHbxPgJANrpZDyRq3rNQEb+zo 7tNN9VwOYhK8+sI8SCVWIXiS3hI6sLIKnb1pDsi8ILQUMJI58sKBv4OM7YaNfP4tZuGh g+4xg1vyBNvBxRWH4wDB7ejKtUioT29ojXKQro9h71ZscKOzHDKEqGL5iVWSJdyo5tf+ uHWg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:message-id:date:subject:cc:to:from; bh=iYAnj8Efn866AbCNAWERP5lze7GNA1pwBjFQkTXA9G0=; b=g33uGcRfluRvxEmWSjT9q5RdpzRMk/j1IOIqR4G2T0KYIfLK01rZ+2cica5JtXFlUo EawVYIv+eog8aaS7o4Bk/KsIPDyXg57Z5P+NuDtHWtXKthDC7wugRsnPexo/lJLG10dj rbezC+s8GfVHd84hxCDQ18lNWTggOgbbQpIey5XDt+Ts6GWDE0JMAXTVPARYrZ8F82vu /Ts+3uY4m9cPOSSLTd9kplVE+qGeEff3jzA5u2QMhTUsqPkH+Ef4/bv5NAeTSU0qGfRS IxpP8PJHuYuBmVm4JLXoSVklDKQ75SDSdHb3JgJ+MjcmdVtg9tJuUtWPHkBXXChPGpg6 VtQQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id wu9-20020a170906eec900b007adfe2889efsi6210394ejb.607.2022.12.02.04.55.06; Fri, 02 Dec 2022 04:55:30 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232588AbiLBMrF (ORCPT <rfc822;lhua1029@gmail.com> + 99 others); Fri, 2 Dec 2022 07:47:05 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43304 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232450AbiLBMrD (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 2 Dec 2022 07:47:03 -0500 Received: from frasgout13.his.huawei.com (frasgout13.his.huawei.com [14.137.139.46]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DC7AED4AE7 for <linux-kernel@vger.kernel.org>; Fri, 2 Dec 2022 04:47:01 -0800 (PST) Received: from mail02.huawei.com (unknown [172.18.147.229]) by frasgout13.his.huawei.com (SkyGuard) with ESMTP id 4NNswG4zGhz9xHYT for <linux-kernel@vger.kernel.org>; Fri, 2 Dec 2022 20:39:58 +0800 (CST) Received: from huaweicloud.com (unknown [10.206.133.88]) by APP2 (Coremail) with SMTP id GxC2BwAXSfWj84ljytWyAA--.50887S2; Fri, 02 Dec 2022 13:46:38 +0100 (CET) From: Jonas Oberhauser <jonas.oberhauser@huaweicloud.com> To: paulmck@kernel.org Cc: stern@rowland.harvard.edu, parri.andrea@gmail.com, will@kernel.org, peterz@infradead.org, boqun.feng@gmail.com, npiggin@gmail.com, dhowells@redhat.com, j.alglave@ucl.ac.uk, luc.maranget@inria.fr, akiyks@gmail.com, dlustig@nvidia.com, joel@joelfernandes.org, urezki@gmail.com, quic_neeraju@quicinc.com, frederic@kernel.org, linux-kernel@vger.kernel.org, Jonas Oberhauser <jonas.oberhauser@huawei.com> Subject: [PATCH v2] tools: memory-model: Make plain accesses carry dependencies Date: Fri, 2 Dec 2022 13:51:00 +0100 Message-Id: <20221202125100.30146-1-jonas.oberhauser@huaweicloud.com> X-Mailer: git-send-email 2.17.1 X-CM-TRANSID: GxC2BwAXSfWj84ljytWyAA--.50887S2 X-Coremail-Antispam: 1UD129KBjvJXoW3Wry8KFWDCw43tF1kCw4fAFb_yoW7XFyDp3 ykK345KF4ktr9xZF97Gw17WFyfWan7Cr1UJrna9ry09r45ur1Fyry3KryYvFy8ur4kA3WU urWYqF40kw1kJaDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUkYb4IE77IF4wAFF20E14v26ryj6rWUM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6r106r1rM7CIcVAFz4kK6r1j6r18M28lY4IEw2IIxxk0rwA2F7IY1VAKz4 vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_Jr0_JF4l84ACjcxK6xIIjxv20xvEc7Cj xVAFwI0_Gr0_Cr1l84ACjcxK6I8E87Iv67AKxVW8JVWxJwA2z4x0Y4vEx4A2jsIEc7CjxV AFwI0_Gr0_Gr1UM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40E x7xfMcIj6xIIjxv20xvE14v26r106r15McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x 0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41lFIxGxcIEc7CjxVA2Y2ka0xkIwI1l42xK82IY c2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s 026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r4a6rW5MIIYrxkI7VAKI48JMIIF 0xvE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14v26r4j6F4UMIIF0x vE42xK8VAvwI8IcIk0rVW3JVWrJr1lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280 aVCY1x0267AKxVW8JVW8JrUvcSsGvfC2KfnxnUUI43ZEXa7IUbG2NtUUUUU== X-CM-SenderInfo: 5mrqt2oorev25kdx2v3u6k3tpzhluzxrxghudrp/ X-CFilter-Loop: Reflected X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751106957344305242?= X-GMAIL-MSGID: =?utf-8?q?1751106957344305242?= |
Series |
[v2] tools: memory-model: Make plain accesses carry dependencies
|
|
Commit Message
Jonas Oberhauser
Dec. 2, 2022, 12:51 p.m. UTC
From: Jonas Oberhauser <jonas.oberhauser@huawei.com> As reported by Viktor, plain accesses in LKMM are weaker than accesses to registers: the latter carry dependencies but the former do not. This is exemplified in the following snippet: int r = READ_ONCE(*x); WRITE_ONCE(*y, r); Here a data dependency links the READ_ONCE() to the WRITE_ONCE(), preserving their order, because the model treats r as a register. If r is turned into a memory location accessed by plain accesses, however, the link is broken and the order between READ_ONCE() and WRITE_ONCE() is no longer preserved. This is too conservative, since any optimizations on plain accesses that might break dependencies are also possible on registers; it also contradicts the intuitive notion of "dependency" as the data stored by the WRITE_ONCE() does depend on the data read by the READ_ONCE(), independently of whether r is a register or a memory location. This is resolved by redefining all dependencies to include dependencies carried by memory accesses; a dependency is said to be carried by memory accesses (in the model: carry-dep) from one load to another load if the initial load is followed by an arbitrarily long sequence alternating between stores and loads of the same thread, where the data of each store depends on the previous load, and is read by the next load. Any dependency linking the final load in the sequence to another access also links the initial load in the sequence to that access. Reported-by: Viktor Vafeiadis <viktor@mpi-sws.org> Signed-off-by: Jonas Oberhauser <jonas.oberhauser@huawei.com> Reviewed-by: Reviewed-by: Alan Stern <stern@rowland.harvard.edu> --- .../Documentation/explanation.txt | 9 +++++- tools/memory-model/linux-kernel.bell | 6 ++++ .../litmus-tests/dep+plain.litmus | 31 +++++++++++++++++++ 3 files changed, 45 insertions(+), 1 deletion(-) create mode 100644 tools/memory-model/litmus-tests/dep+plain.litmus
Comments
[Dropping most CCs]
Hi Jonas,
A comment on the way this patch was sent.
On Fri, 2 Dec 2022 13:51:00 +0100, Jonas Oberhauser wrote:
> From: Jonas Oberhauser <jonas.oberhauser@huawei.com>
The sender of this patch reads
"Jonas Oberhauser <jonas.oberhauser@huaweicloud.com>".
Is there any technical reason you've sent this patch in this way?
Thanks, Akira
[...]
On Fri, Dec 02, 2022 at 01:51:00PM +0100, Jonas Oberhauser wrote: > From: Jonas Oberhauser <jonas.oberhauser@huawei.com> > > As reported by Viktor, plain accesses in LKMM are weaker than > accesses to registers: the latter carry dependencies but the former > do not. This is exemplified in the following snippet: > > int r = READ_ONCE(*x); > WRITE_ONCE(*y, r); > > Here a data dependency links the READ_ONCE() to the WRITE_ONCE(), > preserving their order, because the model treats r as a register. > If r is turned into a memory location accessed by plain accesses, > however, the link is broken and the order between READ_ONCE() and > WRITE_ONCE() is no longer preserved. > > This is too conservative, since any optimizations on plain > accesses that might break dependencies are also possible on > registers; it also contradicts the intuitive notion of "dependency" > as the data stored by the WRITE_ONCE() does depend on the data read > by the READ_ONCE(), independently of whether r is a register or a > memory location. > > This is resolved by redefining all dependencies to include > dependencies carried by memory accesses; a dependency is said to be > carried by memory accesses (in the model: carry-dep) from one load > to another load if the initial load is followed by an arbitrarily > long sequence alternating between stores and loads of the same > thread, where the data of each store depends on the previous load, > and is read by the next load. > > Any dependency linking the final load in the sequence to another > access also links the initial load in the sequence to that access. > > Reported-by: Viktor Vafeiadis <viktor@mpi-sws.org> > Signed-off-by: Jonas Oberhauser <jonas.oberhauser@huawei.com> > Reviewed-by: Reviewed-by: Alan Stern <stern@rowland.harvard.edu> s/Reviewed-by: Reviewed-by:/Reviewed-by:^2 to save some space ? ;-) Joke aside, I wonder is this patch a first step to solve the OOTA problem you reported in OSS: https://static.sched.com/hosted_files/osseu2022/e1/oss-eu22-jonas.pdf ? /me catching up slowly on that topic, so I'm curious. If so maybe it's better to put the link in the commit log I think. Regards, Boqun > --- > .../Documentation/explanation.txt | 9 +++++- > tools/memory-model/linux-kernel.bell | 6 ++++ > .../litmus-tests/dep+plain.litmus | 31 +++++++++++++++++++ > 3 files changed, 45 insertions(+), 1 deletion(-) > create mode 100644 tools/memory-model/litmus-tests/dep+plain.litmus > > diff --git a/tools/memory-model/Documentation/explanation.txt b/tools/memory-model/Documentation/explanation.txt > index e901b47236c3..8e7085238470 100644 > --- a/tools/memory-model/Documentation/explanation.txt > +++ b/tools/memory-model/Documentation/explanation.txt > @@ -2575,7 +2575,7 @@ smp_store_release() -- which is basically how the Linux kernel treats > them. > > Although we said that plain accesses are not linked by the ppo > -relation, they do contribute to it indirectly. Namely, when there is > +relation, they do contribute to it indirectly. Firstly, when there is > an address dependency from a marked load R to a plain store W, > followed by smp_wmb() and then a marked store W', the LKMM creates a > ppo link from R to W'. The reasoning behind this is perhaps a little > @@ -2584,6 +2584,13 @@ for this source code in which W' could execute before R. Just as with > pre-bounding by address dependencies, it is possible for the compiler > to undermine this relation if sufficient care is not taken. > > +Secondly, plain accesses can carry dependencies: If a data dependency > +links a marked load R to a store W, and the store is read by a load R' > +from the same thread, then the data loaded by R' depends on the data > +loaded originally by R. Thus, if R' is linked to any access X by a > +dependency, R is also linked to access X by the same dependency, even > +if W' or R' (or both!) are plain. > + > There are a few oddball fences which need special treatment: > smp_mb__before_atomic(), smp_mb__after_atomic(), and > smp_mb__after_spinlock(). The LKMM uses fence events with special > diff --git a/tools/memory-model/linux-kernel.bell b/tools/memory-model/linux-kernel.bell > index 65c32ca9d5ea..5f0b98c1ab81 100644 > --- a/tools/memory-model/linux-kernel.bell > +++ b/tools/memory-model/linux-kernel.bell > @@ -76,3 +76,9 @@ flag ~empty different-values(srcu-rscs) as srcu-bad-nesting > let Marked = (~M) | IW | Once | Release | Acquire | domain(rmw) | range(rmw) | > LKR | LKW | UL | LF | RL | RU > let Plain = M \ Marked > + > +(* Redefine dependencies to include those carried through plain accesses *) > +let carry-dep = (data ; rfi)* > +let addr = carry-dep ; addr > +let ctrl = carry-dep ; ctrl > +let data = carry-dep ; data > diff --git a/tools/memory-model/litmus-tests/dep+plain.litmus b/tools/memory-model/litmus-tests/dep+plain.litmus > new file mode 100644 > index 000000000000..ebf84daa9a59 > --- /dev/null > +++ b/tools/memory-model/litmus-tests/dep+plain.litmus > @@ -0,0 +1,31 @@ > +C dep+plain > + > +(* > + * Result: Never > + * > + * This litmus test demonstrates that in LKMM, plain accesses > + * carry dependencies much like accesses to registers: > + * The data stored to *z1 and *z2 by P0() originates from P0()'s > + * READ_ONCE(), and therefore using that data to compute the > + * conditional of P0()'s if-statement creates a control dependency > + * from that READ_ONCE() to P0()'s WRITE_ONCE(). > + *) > + > +{} > + > +P0(int *x, int *y, int *z1, int *z2) > +{ > + int a = READ_ONCE(*x); > + *z1 = a; > + *z2 = *z1; > + if (*z2 == 1) > + WRITE_ONCE(*y, 1); > +} > + > +P1(int *x, int *y) > +{ > + int r = smp_load_acquire(y); > + smp_store_release(x, r); > +} > + > +exists (x=1 /\ y=1) > -- > 2.17.1 >
On Sat, Dec 03, 2022 at 11:58:36AM +0000, Jonas Oberhauser wrote: > > > -----Original Message----- > From: Boqun Feng [mailto:boqun.feng@gmail.com] > Sent: Friday, December 2, 2022 7:50 PM > > > > Reviewed-by: Reviewed-by: Alan Stern <stern@rowland.harvard.edu> > > > s/Reviewed-by: Reviewed-by:/Reviewed-by:^2 to save some space ? ;-) > > > Oh, I didn't know I'm allowed to compress things like that! Can I use ² as well to save another character? Heh! I might miss that, and who knows? The bots might think that "²" was the first letter of your name. ;-) > > I wonder is this patch a first step to solve the OOTA problem you reported in OSS: > > https://static.sched.com/hosted_files/osseu2022/e1/oss-eu22-jonas.pdf > > If so maybe it's better to put the link in the commit log I think. > > It's not directly related to that specific problem, it does solve some other OOTA issues though. > If you think we should link to the talk, there's also a video with slightly more updated slides from the actual talk: https://www.youtube.com/watch?v=iFDKhIxKhoQ > do you think I should link to both then? It is not hard for me to add that in if people believe that it should be included. But default is lazy in this case. ;-) Thanx, Paul
On Sat, Dec 03, 2022 at 11:02:26AM -0800, Paul E. McKenney wrote: > On Sat, Dec 03, 2022 at 11:58:36AM +0000, Jonas Oberhauser wrote: > > > > > > -----Original Message----- > > From: Boqun Feng [mailto:boqun.feng@gmail.com] > > Sent: Friday, December 2, 2022 7:50 PM > > > > > > Reviewed-by: Reviewed-by: Alan Stern <stern@rowland.harvard.edu> > > > > > s/Reviewed-by: Reviewed-by:/Reviewed-by:^2 to save some space ? ;-) > > > > > > Oh, I didn't know I'm allowed to compress things like that! Can I use ² as well to save another character? > > Heh! I might miss that, and who knows? The bots might think that "²" > was the first letter of your name. ;-) > > > > I wonder is this patch a first step to solve the OOTA problem you reported in OSS: > > > https://static.sched.com/hosted_files/osseu2022/e1/oss-eu22-jonas.pdf > > > If so maybe it's better to put the link in the commit log I think. > > > > It's not directly related to that specific problem, it does solve some other OOTA issues though. > > If you think we should link to the talk, there's also a video with slightly more updated slides from the actual talk: https://www.youtube.com/watch?v=iFDKhIxKhoQ > > do you think I should link to both then? > > It is not hard for me to add that in if people believe that it should be > included. But default is lazy in this case. ;-) > I brought this up because, as we recently experience in RCU code, we need answers of "why we did this?" to the future us ;-) I agree with Alan, this seems like a good idea, but having some big picture of why we do this may be better. Regards, Boqun > Thanx, Paul
On Sat, Dec 03, 2022 at 11:02:26AM -0800, Paul E. McKenney wrote: > On Sat, Dec 03, 2022 at 11:58:36AM +0000, Jonas Oberhauser wrote: > > > > > > -----Original Message----- > > From: Boqun Feng [mailto:boqun.feng@gmail.com] > > Sent: Friday, December 2, 2022 7:50 PM > > > > I wonder is this patch a first step to solve the OOTA problem you reported in OSS: > > > https://static.sched.com/hosted_files/osseu2022/e1/oss-eu22-jonas.pdf > > > If so maybe it's better to put the link in the commit log I think. > > > > It's not directly related to that specific problem, it does solve some other OOTA issues though. > > If you think we should link to the talk, there's also a video with slightly more updated slides from the actual talk: https://www.youtube.com/watch?v=iFDKhIxKhoQ > > do you think I should link to both then? > > It is not hard for me to add that in if people believe that it should be > included. But default is lazy in this case. ;-) I don't think there's any need to mention that video in the commit log. It's an introductory talk, and it's pretty safe to assume that anyone reading the commit because they are interested in the LKMM in great detail is already beyond the introductory level. On the other hand, it wouldn't hurt to include a Link: tag to an appropriate message in this email thread. (I leave it up to Paul to decide which message is most "appropriate" -- there may not be a good candidate, because a lot of the messages were not CC'ed to LKML.) Alan
On Sat, Dec 03, 2022 at 12:14:29PM -0800, Boqun Feng wrote: > On Sat, Dec 03, 2022 at 11:02:26AM -0800, Paul E. McKenney wrote: > > On Sat, Dec 03, 2022 at 11:58:36AM +0000, Jonas Oberhauser wrote: > > > > > > > > > -----Original Message----- > > > From: Boqun Feng [mailto:boqun.feng@gmail.com] > > > Sent: Friday, December 2, 2022 7:50 PM > > > > > > > > Reviewed-by: Reviewed-by: Alan Stern <stern@rowland.harvard.edu> > > > > > > > s/Reviewed-by: Reviewed-by:/Reviewed-by:^2 to save some space ? ;-) > > > > > > > > > Oh, I didn't know I'm allowed to compress things like that! Can I use ² as well to save another character? > > > > Heh! I might miss that, and who knows? The bots might think that "²" > > was the first letter of your name. ;-) > > > > > > I wonder is this patch a first step to solve the OOTA problem you reported in OSS: > > > > https://static.sched.com/hosted_files/osseu2022/e1/oss-eu22-jonas.pdf > > > > If so maybe it's better to put the link in the commit log I think. > > > > > > It's not directly related to that specific problem, it does solve some other OOTA issues though. > > > If you think we should link to the talk, there's also a video with slightly more updated slides from the actual talk: https://www.youtube.com/watch?v=iFDKhIxKhoQ > > > do you think I should link to both then? > > > > It is not hard for me to add that in if people believe that it should be > > included. But default is lazy in this case. ;-) > > > > I brought this up because, as we recently experience in RCU code, we > need answers of "why we did this?" to the future us ;-) > > I agree with Alan, this seems like a good idea, but having some big > picture of why we do this may be better. Fair enough! How about something like this as a new paragraph at the end of the commit log? Thanx, Paul ------------------------------------------------------------------------ For more on LKMM and dependencies, please see this Open Source Summit talk: https://www.youtube.com/watch?v=iFDKhIxKhoQ
On Sat, Dec 03, 2022 at 03:34:20PM -0500, stern@rowland.harvard.edu wrote: > On Sat, Dec 03, 2022 at 11:02:26AM -0800, Paul E. McKenney wrote: > > On Sat, Dec 03, 2022 at 11:58:36AM +0000, Jonas Oberhauser wrote: > > > > > > > > > -----Original Message----- > > > From: Boqun Feng [mailto:boqun.feng@gmail.com] > > > Sent: Friday, December 2, 2022 7:50 PM > > > > > > I wonder is this patch a first step to solve the OOTA problem you reported in OSS: > > > > https://static.sched.com/hosted_files/osseu2022/e1/oss-eu22-jonas.pdf > > > > If so maybe it's better to put the link in the commit log I think. > > > > > > It's not directly related to that specific problem, it does solve some other OOTA issues though. > > > If you think we should link to the talk, there's also a video with slightly more updated slides from the actual talk: https://www.youtube.com/watch?v=iFDKhIxKhoQ > > > do you think I should link to both then? > > > > It is not hard for me to add that in if people believe that it should be > > included. But default is lazy in this case. ;-) > > I don't think there's any need to mention that video in the commit log. > It's an introductory talk, and it's pretty safe to assume that anyone > reading the commit because they are interested in the LKMM in great > detail is already beyond the introductory level. > > On the other hand, it wouldn't hurt to include a Link: tag to an > appropriate message in this email thread. (I leave it up to Paul to > decide which message is most "appropriate" -- there may not be a good > candidate, because a lot of the messages were not CC'ed to LKML.) For this approach, I would add this: Link: https://lore.kernel.org/all/4262e55407294a5989e766bc4dc48293@huawei.com/ I could of course do both the extra paragraph -and- the Link:. ;-) Thoughts? Thanx, Paul
On Sat, Dec 03, 2022 at 12:44:05PM -0800, Paul E. McKenney wrote: > On Sat, Dec 03, 2022 at 03:34:20PM -0500, stern@rowland.harvard.edu wrote: > > On Sat, Dec 03, 2022 at 11:02:26AM -0800, Paul E. McKenney wrote: > > > On Sat, Dec 03, 2022 at 11:58:36AM +0000, Jonas Oberhauser wrote: > > > > > > > > > > > > -----Original Message----- > > > > From: Boqun Feng [mailto:boqun.feng@gmail.com] > > > > Sent: Friday, December 2, 2022 7:50 PM > > > > > > > > I wonder is this patch a first step to solve the OOTA problem you reported in OSS: > > > > > https://static.sched.com/hosted_files/osseu2022/e1/oss-eu22-jonas.pdf > > > > > If so maybe it's better to put the link in the commit log I think. > > > > > > > > It's not directly related to that specific problem, it does solve some other OOTA issues though. > > > > If you think we should link to the talk, there's also a video with slightly more updated slides from the actual talk: https://www.youtube.com/watch?v=iFDKhIxKhoQ > > > > do you think I should link to both then? > > > > > > It is not hard for me to add that in if people believe that it should be > > > included. But default is lazy in this case. ;-) > > > > I don't think there's any need to mention that video in the commit log. > > It's an introductory talk, and it's pretty safe to assume that anyone > > reading the commit because they are interested in the LKMM in great > > detail is already beyond the introductory level. > > > > On the other hand, it wouldn't hurt to include a Link: tag to an > > appropriate message in this email thread. (I leave it up to Paul to > > decide which message is most "appropriate" -- there may not be a good > > candidate, because a lot of the messages were not CC'ed to LKML.) > > For this approach, I would add this: > > Link: https://lore.kernel.org/all/4262e55407294a5989e766bc4dc48293@huawei.com/ > > I could of course do both the extra paragraph -and- the Link:. ;-) > > Thoughts? > I think only having Link: is fine ;-) And I agree with Alan, no need to mention that video. Thank you! Regards, Boqun > Thanx, Paul
On Sat, Dec 03, 2022 at 12:52:02PM -0800, Boqun Feng wrote: > On Sat, Dec 03, 2022 at 12:44:05PM -0800, Paul E. McKenney wrote: > > On Sat, Dec 03, 2022 at 03:34:20PM -0500, stern@rowland.harvard.edu wrote: > > > On Sat, Dec 03, 2022 at 11:02:26AM -0800, Paul E. McKenney wrote: > > > > On Sat, Dec 03, 2022 at 11:58:36AM +0000, Jonas Oberhauser wrote: > > > > > > > > > > > > > > > -----Original Message----- > > > > > From: Boqun Feng [mailto:boqun.feng@gmail.com] > > > > > Sent: Friday, December 2, 2022 7:50 PM > > > > > > > > > > I wonder is this patch a first step to solve the OOTA problem you reported in OSS: > > > > > > https://static.sched.com/hosted_files/osseu2022/e1/oss-eu22-jonas.pdf > > > > > > If so maybe it's better to put the link in the commit log I think. > > > > > > > > > > It's not directly related to that specific problem, it does solve some other OOTA issues though. > > > > > If you think we should link to the talk, there's also a video with slightly more updated slides from the actual talk: https://www.youtube.com/watch?v=iFDKhIxKhoQ > > > > > do you think I should link to both then? > > > > > > > > It is not hard for me to add that in if people believe that it should be > > > > included. But default is lazy in this case. ;-) > > > > > > I don't think there's any need to mention that video in the commit log. > > > It's an introductory talk, and it's pretty safe to assume that anyone > > > reading the commit because they are interested in the LKMM in great > > > detail is already beyond the introductory level. > > > > > > On the other hand, it wouldn't hurt to include a Link: tag to an > > > appropriate message in this email thread. (I leave it up to Paul to > > > decide which message is most "appropriate" -- there may not be a good > > > candidate, because a lot of the messages were not CC'ed to LKML.) > > > > For this approach, I would add this: > > > > Link: https://lore.kernel.org/all/4262e55407294a5989e766bc4dc48293@huawei.com/ > > > > I could of course do both the extra paragraph -and- the Link:. ;-) > > > > Thoughts? > > > > I think only having Link: is fine ;-) And I agree with Alan, no need to > mention that video. Very good, I will add the Link: on the next rebase. And I will even refrain from adding the URL of the infamous "why not both" video to this email. ;-) Thanx, Paul
On Sat, Dec 03, 2022 at 12:44:05PM -0800, Paul E. McKenney wrote: > On Sat, Dec 03, 2022 at 03:34:20PM -0500, stern@rowland.harvard.edu wrote: > > On Sat, Dec 03, 2022 at 11:02:26AM -0800, Paul E. McKenney wrote: > > > On Sat, Dec 03, 2022 at 11:58:36AM +0000, Jonas Oberhauser wrote: > > > > > > > > > > > > -----Original Message----- > > > > From: Boqun Feng [mailto:boqun.feng@gmail.com] > > > > Sent: Friday, December 2, 2022 7:50 PM > > > > > > > > I wonder is this patch a first step to solve the OOTA problem you reported in OSS: > > > > > https://static.sched.com/hosted_files/osseu2022/e1/oss-eu22-jonas.pdf > > > > > If so maybe it's better to put the link in the commit log I think. > > > > > > > > It's not directly related to that specific problem, it does solve some other OOTA issues though. > > > > If you think we should link to the talk, there's also a video with slightly more updated slides from the actual talk: https://www.youtube.com/watch?v=iFDKhIxKhoQ > > > > do you think I should link to both then? > > > > > > It is not hard for me to add that in if people believe that it should be > > > included. But default is lazy in this case. ;-) > > > > I don't think there's any need to mention that video in the commit log. > > It's an introductory talk, and it's pretty safe to assume that anyone > > reading the commit because they are interested in the LKMM in great > > detail is already beyond the introductory level. > > > > On the other hand, it wouldn't hurt to include a Link: tag to an > > appropriate message in this email thread. (I leave it up to Paul to > > decide which message is most "appropriate" -- there may not be a good > > candidate, because a lot of the messages were not CC'ed to LKML.) > > For this approach, I would add this: > > Link: https://lore.kernel.org/all/4262e55407294a5989e766bc4dc48293@huawei.com/ There's no point including that link; it merely points to messages containing or commenting on early versions of the commit. It adds very little information not already present in the commit itself. (Have you read any of Linus's criticisms of the Link: tags that people tend to include in patches they send him? It's the same principle.) I was thinking of the discussion which led up to the commit being written, where Jonas first brought up the idea that plain accesses should be able to carry dependencies just like accesses to registers. That's the sort of thing which would give readers some context and understanding of the reasoning behind the commit. They were part of the thread with the subject "RE: Interesting LKMM litmus test". But I can't find those messages on lore.kernel.org (which isn't surprising, as they weren't CC'ed to any mailing lists). > I could of course do both the extra paragraph -and- the Link:. ;-) > > Thoughts? My advice: Omit them both. Alan
On Sat, Dec 03, 2022 at 04:32:19PM -0500, stern@rowland.harvard.edu wrote: > On Sat, Dec 03, 2022 at 12:44:05PM -0800, Paul E. McKenney wrote: > > On Sat, Dec 03, 2022 at 03:34:20PM -0500, stern@rowland.harvard.edu wrote: > > > On Sat, Dec 03, 2022 at 11:02:26AM -0800, Paul E. McKenney wrote: > > > > On Sat, Dec 03, 2022 at 11:58:36AM +0000, Jonas Oberhauser wrote: > > > > > > > > > > > > > > > -----Original Message----- > > > > > From: Boqun Feng [mailto:boqun.feng@gmail.com] > > > > > Sent: Friday, December 2, 2022 7:50 PM > > > > > > > > > > I wonder is this patch a first step to solve the OOTA problem you reported in OSS: > > > > > > https://static.sched.com/hosted_files/osseu2022/e1/oss-eu22-jonas.pdf > > > > > > If so maybe it's better to put the link in the commit log I think. > > > > > > > > > > It's not directly related to that specific problem, it does solve some other OOTA issues though. > > > > > If you think we should link to the talk, there's also a video with slightly more updated slides from the actual talk: https://www.youtube.com/watch?v=iFDKhIxKhoQ > > > > > do you think I should link to both then? > > > > > > > > It is not hard for me to add that in if people believe that it should be > > > > included. But default is lazy in this case. ;-) > > > > > > I don't think there's any need to mention that video in the commit log. > > > It's an introductory talk, and it's pretty safe to assume that anyone > > > reading the commit because they are interested in the LKMM in great > > > detail is already beyond the introductory level. > > > > > > On the other hand, it wouldn't hurt to include a Link: tag to an > > > appropriate message in this email thread. (I leave it up to Paul to > > > decide which message is most "appropriate" -- there may not be a good > > > candidate, because a lot of the messages were not CC'ed to LKML.) > > > > For this approach, I would add this: > > > > Link: https://lore.kernel.org/all/4262e55407294a5989e766bc4dc48293@huawei.com/ > > There's no point including that link; it merely points to messages > containing or commenting on early versions of the commit. It adds very > little information not already present in the commit itself. (Have you > read any of Linus's criticisms of the Link: tags that people tend to > include in patches they send him? It's the same principle.) > > I was thinking of the discussion which led up to the commit being > written, where Jonas first brought up the idea that plain accesses > should be able to carry dependencies just like accesses to registers. > That's the sort of thing which would give readers some context and > understanding of the reasoning behind the commit. They were part of the > thread with the subject "RE: Interesting LKMM litmus test". > > But I can't find those messages on lore.kernel.org (which isn't > surprising, as they weren't CC'ed to any mailing lists). I am OK with them being made public, maybe in a Google document or some such. We would of course also need the consent of everyone else on that thread. > > I could of course do both the extra paragraph -and- the Link:. ;-) > > > > Thoughts? > > My advice: Omit them both. It would be good to reference something or another. ;-) Thanx, Paul
On Sun, Dec 04, 2022 at 12:15:27AM +0000, Jonas Oberhauser wrote: > > > -----Original Message----- > From: Paul E. McKenney [mailto:paulmck@kernel.org] > Sent: Sunday, December 4, 2022 12:11 AM > To: stern@rowland.harvard.edu > > On Sat, Dec 03, 2022 at 04:32:19PM -0500, stern@rowland.harvard.edu wrote: > > > My advice: Omit them both. > > It would be good to reference something or another. ;-) > > I also prefer to not refer to that presentation. > If there is a feeling that more context is needed, I would first > prefer to enhance the commit message itself in some way. (Personally I > don't feel that this is needed, and the imho the issue stands by > itself even without reference to OOTA, which could be resolved fully > independently e.g. by Viktor's suggestion to just axiomatically forbid > OOTA --- the issue addressed by this patch would still exist). If The reason that I'm gving you a hard time is that I haven't seen a real world code usage that needs this fix, maybe there is one and I'm just stupid and not knowing about. Your litmus explains the problem very well but it's better if there is real world code expecting this ordering. Not saying real world code is essential for memory model changes, but without it, I guess the rationale of this patch is "plain accesses shouldn't be weaker than registers" or "This (plain accesses don't provide dependencies) is too conservative", and these don't seem very strong without a bigger motivation behind it. Also I'm in the impression that people love to put READ_ONCE()/WRITE_ONCE() when they have some ordering issues (in real world or with LKMM). Although I don't like this, but you cannot blame people who just want more guarantee allowing their code to work ;-( This is also another reason that I'd like to see strong reasoning of this change. Besides, could you also explain a little bit why only "data;rfi" can be "carry-dep" but "ctrl;rfi" and "addr;rfi" cannot? I think it's because there are special cases when compilers can figure out a condition being true or an address being constant therefore break the dependency? But maybe I'm wrong or missing something. Thank you! (Please don't be mad at me, sometimes I'm just slow to understand things ;-)) Regards, Boqun > that's not satisfactory, I would also consent to publishing the > e-mails from the thread starting where I relayed Viktor's observation > of the relaxed accesses, but I don't recall it saying anything > substantially beyond the current commit log + the documentation > included in the patch. > > best wishes, jonas
On Mon, Dec 05, 2022 at 01:42:46PM +0000, Jonas Oberhauser wrote: > > Besides, could you also explain a little bit why only "data;rfi" can be "carry-dep" but "ctrl;rfi" and "addr;rfi" cannot? I think it's because there are special cases when compilers can figure out a condition being true or an address being constant therefore break the dependency > > Oh, good question. A bit hard for me to write down the answer clearly > (which some people will claim that I don't understand it well myself, > but I beg to differ :) :( :) ). > > In a nutshell, it's because x ->data [Plain] ->rfi y ->... z fulfils > the same role as storing something in a register and then using it in > a subsequent computation; x ->ctrl y ->... z (and ->addr) don't. So > it's not due to smart compilers, just the fact that the other two > cases seem unrelated to the problem being solved, and including them > might introduce some unsoundness (not that I have checked if they do). More can be said here. Consider the following simple example: void P0(int *x, int *y) { int r1, r2; int a[10]; r1 = READ_ONCE(*x); a[r1] = 1; r2 = a[r1]; WRITE_ONCE(*y, r2); } There is an address dependency from the READ_ONCE to the plain store in a[r1]. Then there is an rfi and a data dependency to the WRITE_ONCE. But in this example, the WRITE_ONCE is _not_ ordered after the READ_ONCE, even though they are linked by (addr ; rfi ; data). The compiler knows that the value of r1 does not change between the two plain accesses, so it knows that it can optimize the code to be: r1 = READ_ONCE(*x); r2 = 1; WRITE_ONCE(*y, r2); a[r1] = r2; And then the CPU can execute the WRITE_ONCE before the READ_ONCE. This shows that (addr ; rfi) must not be included in the carry-deps relation. You may be able to come up with a similar argument for (ctrl ; rfi), although it might not be quite as clear. Alan
On Mon, Dec 05, 2022 at 01:42:46PM +0000, Jonas Oberhauser wrote: > > Without it, I guess the rationale of this patch is "plain accesses > > shouldn't be weaker than registers" or "This (plain accesses don't > > provide dependencies) is too conservative", and these don't seem > > very strong without a bigger motivation behind it. > > Good points, thanks. > > > -----Original Message----- > From: Boqun Feng [mailto:boqun.feng@gmail.com] > Sent: Sunday, December 4, 2022 9:33 AM > > > > a real world code usage that needs this fix > > Note that in the standard C world, 'int a' (and even 'register int a') > declares an object, i.e., a memory location that is accessed with > plain accesses. In my opinion, that 'int a' is a "register" in herd7 > is an implementation artifact that hides the current weakness > addressed by this patch. So if you would actually analyze pretty much > any code that looks like > > int a = <some atomic load> > if (a == ...) { <some atomic store> } > > then under the standard interpretation of that code, a is a memory > location, and there is (in current LKMM) no control dependency, and > the code would be considered broken by LKMM. > > Or in another view, whether something that is (provably) never > accessed externally is to be considered a 'register' or a 'memory > location' should be an implementation detail of the compiler etc. > (with the possibility to freely do reg2mem/mem2reg), and not have any > effect on the ordering guarantees. Notably most tools (GenMC for sure > and I think also Dat3M) use such compiler transformations (especially > mem2reg) because they live under the assumption that it doesn't change > anything, but of course with current LKMM that renders all of these > tools unsound; if you would carefully avoid to do any such > transformations, then these tools would also report bugs in a lot more > linux code. > Thanks! This is exactly what I want to see ;-) Now I understand more about the rationale of this change: First, Linux kernel developers expect and use dependencies ordering. Second, herd modeling has the concept of "registers" that carries the dependency because herd also models asm code where "registers" do exist. Third, for C standard or other models' viewpoint, LKMM has some inconsistency since both "int a" and "*z" are "memory locations", however LKMM currently respects dependencies carried by the first but not those carried by the latter. Finally, to close the gap with LKMM+herd with other model tools and C standard, all dependencies that carried by "memory locations" should be enforced in LKMM. I must say I didn't get the "Third" part from your patch at first, now re-read your patch, I think you did mention that in some degree. I guess the reason I did get it first is because my mind is herded ;-) How about adding some comments before "carry-dep", for example: (* Redefine dependencies to include those carried through plain accesses, since herd by default only generates dependencies via "registers", while LKMM enforces dependencies carried by "memory location" of C standard *) I guess Alan and Paul can do better job on wording, so that just expresses my idea. Maybe instead of saying "Make plain accesses carry dependencies", we say "Make memory location carry dependencies" or "Extend dependency carrying from registers to memory locations". Thoughts? > > > > > Also I'm in the impression that people love to put > > READ_ONCE()/WRITE_ONCE() when they have some ordering issues (in > > real world or with LKMM). > > Note that this would require to do > > int a; WRITE_ONCE(&a, <some_atomic_load>); if (READ_ONCE(&a) == ...) > { <some_atomic_store>; } > > to get a control dependency, and imho violate the point of relaxed > marked accesses -- to point out where races are, but no (additional) > ordering is required. > > > Although I don't like this > > I like this and I have forwarded Paul's email to some friends who like "So you are these guys" ;-) I guess I shouldn't say "I don't like this", I'm OK with either way since I also understand the benefit you mentioned below. > not to put relaxed marked/atomic accesses in their code "because it > makes the code less readable". I completely disagree -- marking racy > accesses makes the code more readable because it lets us know when > something is subject to concurrent access, and modifying the logic may > break code of other threads. > > What I don't like is relying on dependencies in the first place, since > they are hard to maintain -- some innocuous optimizations that people > might want to do (like adding some read in front of a bunch of writes > to see if they're really necessary) bypass the ordering imposed by the > dependency. So I generally advise to use acq loads instead, unless > there is a measurable performance impact. But that's a completely > separate discussion... > True. I actually agree with the "acq loads first, unless performance benefit" approach. [I will reply the ctrl/addr in a separate mail] Regards, Boqun
On Mon, Dec 05, 2022 at 11:18:13AM -0500, stern@rowland.harvard.edu wrote: > On Mon, Dec 05, 2022 at 01:42:46PM +0000, Jonas Oberhauser wrote: > > > Besides, could you also explain a little bit why only "data;rfi" can be "carry-dep" but "ctrl;rfi" and "addr;rfi" cannot? I think it's because there are special cases when compilers can figure out a condition being true or an address being constant therefore break the dependency > > > > Oh, good question. A bit hard for me to write down the answer clearly > > (which some people will claim that I don't understand it well myself, > > but I beg to differ :) :( :) ). Nah, I think your answer is clear to me ;-) > > > > In a nutshell, it's because x ->data [Plain] ->rfi y ->... z fulfils > > the same role as storing something in a register and then using it in > > a subsequent computation; x ->ctrl y ->... z (and ->addr) don't. So > > it's not due to smart compilers, just the fact that the other two > > cases seem unrelated to the problem being solved, and including them > > might introduce some unsoundness (not that I have checked if they do). So it's about whether a value can have a dataflow from x to y, right? In that case registers and memory cells should be treated the same by compilers, therefore we can extend the dependencies. > > More can be said here. Consider the following simple example: > > void P0(int *x, int *y) > { > int r1, r2; > int a[10]; > > r1 = READ_ONCE(*x); > a[r1] = 1; > r2 = a[r1]; > WRITE_ONCE(*y, r2); > } > > There is an address dependency from the READ_ONCE to the plain store in > a[r1]. Then there is an rfi and a data dependency to the WRITE_ONCE. > > But in this example, the WRITE_ONCE is _not_ ordered after the > READ_ONCE, even though they are linked by (addr ; rfi ; data). The > compiler knows that the value of r1 does not change between the two > plain accesses, so it knows that it can optimize the code to be: > > r1 = READ_ONCE(*x); > r2 = 1; > WRITE_ONCE(*y, r2); > a[r1] = r2; > > And then the CPU can execute the WRITE_ONCE before the READ_ONCE. This > shows that (addr ; rfi) must not be included in the carry-deps relation. > > You may be able to come up with a similar argument for (ctrl ; rfi), > although it might not be quite as clear. > Thank you, Alan! One question though, can a "smart" compiler optimize out the case below, with the same logic? void P0(int *x, int *y, int *a) { int r1, r2; r1 = READ_ONCE(*x); // A *a = r1 & 0xffff; // B r2 = *a & 0xffff0000; // C WRITE_ONCE(*y, r2); // D } I think we have A ->data B ->rfi C ->data D, however a "smart" compiler can figure out that r2 is actually zero, right? And the code get optimized to: r1 = READ_ONCE(*x); r2 = 0; WRITE_ONCE(*y, r2); *a = r1 & 0xffff; and break the dependency. I know that our memory model is actually unware of the differences of syntatics dependencies vs semantics syntatics, so one may argue that in the (data; rfi) example above the compiler optimization is outside the scope of LKMM, but won't the same reasoning apply to the (addr; rfi) example from you? The WRITE_ONCE() _syntatically_ depends on load of a[r1], therefore even a "smart" compiler can figure out the value, LKMM won't take that into consideration. Am I missing something subtle here? Regards, Boqun > Alan
On Tue, Dec 06, 2022 at 12:46:58PM -0800, Boqun Feng wrote: > On Mon, Dec 05, 2022 at 11:18:13AM -0500, stern@rowland.harvard.edu wrote: > > On Mon, Dec 05, 2022 at 01:42:46PM +0000, Jonas Oberhauser wrote: > > > > Besides, could you also explain a little bit why only "data;rfi" can be "carry-dep" but "ctrl;rfi" and "addr;rfi" cannot? I think it's because there are special cases when compilers can figure out a condition being true or an address being constant therefore break the dependency > > > > > > Oh, good question. A bit hard for me to write down the answer clearly > > > (which some people will claim that I don't understand it well myself, > > > but I beg to differ :) :( :) ). > > Nah, I think your answer is clear to me ;-) > > > > > > > In a nutshell, it's because x ->data [Plain] ->rfi y ->... z fulfils > > > the same role as storing something in a register and then using it in > > > a subsequent computation; x ->ctrl y ->... z (and ->addr) don't. So > > > it's not due to smart compilers, just the fact that the other two > > > cases seem unrelated to the problem being solved, and including them > > > might introduce some unsoundness (not that I have checked if they do). > > So it's about whether a value can have a dataflow from x to y, right? In > that case registers and memory cells should be treated the same by > compilers, therefore we can extend the dependencies. > > > > More can be said here. Consider the following simple example: > > > > void P0(int *x, int *y) > > { > > int r1, r2; > > int a[10]; > > > > r1 = READ_ONCE(*x); > > a[r1] = 1; > > r2 = a[r1]; > > WRITE_ONCE(*y, r2); > > } > > > > There is an address dependency from the READ_ONCE to the plain store in > > a[r1]. Then there is an rfi and a data dependency to the WRITE_ONCE. > > > > But in this example, the WRITE_ONCE is _not_ ordered after the > > READ_ONCE, even though they are linked by (addr ; rfi ; data). The > > compiler knows that the value of r1 does not change between the two > > plain accesses, so it knows that it can optimize the code to be: > > > > r1 = READ_ONCE(*x); > > r2 = 1; > > WRITE_ONCE(*y, r2); > > a[r1] = r2; > > > > And then the CPU can execute the WRITE_ONCE before the READ_ONCE. This > > shows that (addr ; rfi) must not be included in the carry-deps relation. > > > > You may be able to come up with a similar argument for (ctrl ; rfi), > > although it might not be quite as clear. > > > > Thank you, Alan! One question though, can a "smart" compiler optimize > out the case below, with the same logic? > > void P0(int *x, int *y, int *a) > { > int r1, r2; > > r1 = READ_ONCE(*x); // A > > *a = r1 & 0xffff; // B > > r2 = *a & 0xffff0000; // C > > WRITE_ONCE(*y, r2); // D > > } > > I think we have A ->data B ->rfi C ->data D, however a "smart" compiler > can figure out that r2 is actually zero, right? And the code get > optimized to: > > r1 = READ_ONCE(*x); > r2 = 0; > WRITE_ONCE(*y, r2); > *a = r1 & 0xffff; > > and break the dependency. > > I know that our memory model is actually unware of the differences of > syntatics dependencies vs semantics syntatics, so one may argue that in > the (data; rfi) example above the compiler optimization is outside the > scope of LKMM, but won't the same reasoning apply to the (addr; rfi) > example from you? The WRITE_ONCE() _syntatically_ depends on load of > a[r1], therefore even a "smart" compiler can figure out the value, LKMM I guess it should be that r2 (i.e. the load of a[r1]) _syntatically_ depends on the value of r1. Regards, Boqun > won't take that into consideration. > > Am I missing something subtle here? > > Regards, > Boqun > > > Alan
On Mon, Dec 05, 2022 at 01:42:46PM +0000, Jonas Oberhauser wrote: [...] > > Besides, could you also explain a little bit why only "data;rfi" can > > be "carry-dep" but "ctrl;rfi" and "addr;rfi" cannot? I think it's > > because there are special cases when compilers can figure out a > > condition being true or an address being constant therefore break > > the dependency > > Oh, good question. A bit hard for me to write down the answer clearly > (which some people will claim that I don't understand it well myself, > but I beg to differ :) :( :) ). > > In a nutshell, it's because x ->data [Plain] ->rfi y ->... z > fulfils the same role as storing something in a register and then > using it in a subsequent computation; x ->ctrl y ->... z (and ->addr) > don't. So it's not due to smart compilers, just the fact that the > other two cases seem unrelated to the problem being solved, and > including them might introduce some unsoundness (not that I have > checked if they do). > > Mathematically I imagine the computation graph between register > accesses/computations r_1,...,r_n and memory accesses m_1,...m_k that > has an unlabeled edge m_i -> r_j when m_i loads some data that is an > input to the access/computation r_j, similarly it has an unlabeled > edge r_i -> r_j when the result of r_i is used as an input of r_j, > and finally r_i ->data/ctrl/addr m_j when the value computed by r_j is > the address/data or affects the ctrl of m_j. So for example, if 'int > a' were to declare a register, then > int a = READ_ONCE(&x); > if (a == 5) { WRITE_ONCE(&y,5); } > would have something like (*) 4 events (*= since I'm avoiding fully > formalizing the graph model here I don't really define to which extent > one splits up register reads, computations, etc., so I'll do it > somewhat arbitrarily) > m_1 = READ_ONCE(&x) > r_1 = store the result of m_1 in a > r_2 = compare the content of a to 5 > m_2 = WRITE_ONCE(&y, 5) > > with the edges m_1 -> r_1 -> r_2 ->ctrl m_2 > > Then in the LKMM graph, we would have m_i ->ctrl m_j (or data or addr) > iff there is a path m_i -> r_x1 -> ... -> r_xl ->ctrl m_j (or data or > addr). > So in the example above, m_1 ->ctrl m_2. > > Now what we would do is look at what happens if the compiler/a > tool/me/whatever interprets 'int a' as a memory location. Instead of > r_1 and r_2 from above, we would have something like > > m_1 = READ_ONCE(&x) > r_3 = the result of m_1 (stored in a CPU register) > m_3 = a := the r_3 result > m_4 = load from a > r_4 = the result of m_4 (stored in a CPU register) > m_2 = WRITE_ONCE(&y, 5) > > with m_1 -> r_3 ->data -> m_3 ->rfi m_4 -> r_4 ->ctrl m_2 > and hence > m_1 ->data m_3 ->rfi m_4 ->ctrl m_2 > > What the patch does is make sure that even under this interpretation, > we still have m_1 ->ctrl m_2 > Note that this ->data ->rfi applies in every case where something is > considered a register is turned into a memory location, because those > cases always introduce a store with a fixed address (that memory > location) where the data is the current content of the register, which > is then read (internally) at the next time that data is picked up. A > store to register never becomes a ctrl or addr edge, hence they are > not included in the patch. > Let me see if I can describe your approach in a more formal way (look at me, pretending to know formal methods ;-)) Bascially, what you are saying is that for every valid litmus test with dependencies (carried by local variables i.e. registers), there exists a way to rewrite the litmus test by treating all (or any number of) local variables as memory locations. Lets say the set of the litmus tests to start with is L and the set of the rewritten ones is L'is. Here is a rewrite rule I come up with: 1) for every local named <n> on processor <P>, add an extra memory location (maybe named <n>_<P>) in the processor function, for example P0(int *x, int *y) { int a ...; } became P0(int *x, int *y, int *a_P0) { } 2) for each <n> in 1), for each assignment of <n>, say: int <n> = <expr>; // H or <n> = <expr>; // H rewrite it to int computer_register_<seq> = <expr>; // A *<n>_<P> = computer_register_<seq>; // B where <seq> is a self increasing counter that increases every step or the rewrite. This step introduces an extra edge A ->data B. 3) for each <n> in 1), for each the read of <n>, say: Expr; // T rewrite it to int computer_register_<seq> = *<n>_<P>; // C Expr[<n>/computer_register_<seq>]; // D where <seq> is a self increasing counter that increases every step or the rewrite. "M[x/y]" means changing the expression by replace all appearance of variable x into y, e.g. (n1 = n + 1)[n/y] is n1 = y + 1. Note that for every rewrite 3), there must exists a corresponding rewrite 2), that C reads the value stored by the B of the rewrite 2). This is because the litmus tests in set L are valid, all variables must be assigned a value before used. This step introduces an extra edge B ->rfi C, and also for every dependency between H -> T in the old litmus test, it preserves between C -> D in the new litmus test. 4) for each <n> and <P> in 1), for each expression in the exist clause, say: Expr rewrite it to Expr[<P>:<n>/<n>_<P>] For example, by these rules, the following litmus test: P0(int *x, int *y) { int r = READ_ONCE(*x); WRITE_ONCE(*y, r); } exists(0:r = 1) is rewritten into P0(int *x, int *y, int *r_P0) { int computer_register_0 = READ_ONCE(*x); // by 2) *r_P0 = computer_register_0; // by 2) int computer_register_1 = *r_P0; // by 3) WRITE_ONCE(*y, computer_register_1); // by 3) } exists(r_P0 = 1) // by 4) It's obvious that the rewritten litmus tests are valid, and for every dependency H ->dep T in litmus tests of set L there must exists a A ->data ->rfi ->dep D in L' ("dep" is not the same as in linux-kernel.cat). And note since L' is a set of valid litmus tests, we can do another whole rewrite to L' and get L'' which will contains ->data->rfi->data->rfi->dep, and this covers the (data;rfi)* ;-) Now the hard part, since the rewrite is only one-direction, i.e L => L', or more preciselly the transverse closure of the rewrite is one-direction. We can only prove that if there exists a valid litmus test with dependency ->dep, we can construct a number of litmus tests with (->data ->rfi)*->dep. But to accept the new rules in your patch, we need the opposite direction, that is, if there exists a (->data ->rfi)* ->dep, we can somehow find a litmus test that preserves anything else but reduce (->data->rfi)*->dep into ->dep. (I'm not sure the following is sound, since some more work is needed) The first observation is that we can ignore ->data [Marked] ->rfi, because if we can reduce (->data [Plain] ->rfi)* ->data [Marked] ->rfi -> (->data [Plain] ->rfi)* ->dep to ->data [Marked] ->rfi ->dep we get "hb", which already has the ordering we want. Now we can focus on ->data [Plain] ->rfi ->dep, e.g. x = READ_ONCE(..); // A *<N> = <expr>; // B, contains x r = Expr; // C, Expr contains *<N> WRITE_ONCE(.., r); // D We need to assume that the litmus tests are data-race free, since we only care about valid litmus tests, then it's OK. The rewritten looks like the following: int computer_register_<seq>; x = READ_ONCE(..); // A computer_register = <expr>; // B, contains x // ^^^ basically reverse rewrite of the 2) above r = Expr[*<N>/computer_register_<seq>]; // C // ^^^ basically reverse rewrite of the 3) above *<N> = computer_register_<seq>; // R // ^^^ need this to keep the global memory effect WRITE_ONCE(.., r); // D We need the data-race free assumption to replace a memory cell into a local variable. By this rewrite, We reduce A ->data B ->rfi C ->dep D into A ->dep D. A few details are missing here, for example we may have multip Cs and need to rewrite all of them, and we need normalization work like converting litmus tests into canonical forms (e.g change the += into temporary variables and assignment). Also we need to prove that no more edges (or edges we care) are added. But these look obvious to me, and enough formally digging for me today ;-) As a result, the rewrite operation on L is proved to be isomorphic! ;-) In the sense that we think reducing to (->data [Marked] ->rfi)* ->dep is good enough ;-) Now for every litmus test with (->data ->rfi)* ->dep we can rewrite it into another litmus test preserving all other edges except replacing the above with (->data [Marked] ->rfi)* ->dep and the rewrite only contains replacing non-data-race accesses with local variables and in the sense of C standards and maybe other model tools these litmus tests are equivalent. > Please let me know if this is convincing and clear. If so we could > link to your e-mail and my response to provide context, without having > to put a huge explanation into the commit message. If that's not > desirable I can also think about how to compress this into a shorter > explanation. > My intution now is the rewrite explanation above, which is good enough for me but not sure for other audience. Thank you for keep explaining this to me ;-) Regards, Boqun > Best wishes, > Jonas > > PS: > > > sometimes I'm just slow to understand things ;-) > > Well, welcome in the club :D
On Tue, Dec 06, 2022 at 05:43:49PM -0800, Boqun Feng wrote: > On Mon, Dec 05, 2022 at 01:42:46PM +0000, Jonas Oberhauser wrote: > [...] > > > Besides, could you also explain a little bit why only "data;rfi" can > > > be "carry-dep" but "ctrl;rfi" and "addr;rfi" cannot? I think it's > > > because there are special cases when compilers can figure out a > > > condition being true or an address being constant therefore break > > > the dependency > > > > Oh, good question. A bit hard for me to write down the answer clearly > > (which some people will claim that I don't understand it well myself, > > but I beg to differ :) :( :) ). > > > > In a nutshell, it's because x ->data [Plain] ->rfi y ->... z > > fulfils the same role as storing something in a register and then > > using it in a subsequent computation; x ->ctrl y ->... z (and ->addr) > > don't. So it's not due to smart compilers, just the fact that the > > other two cases seem unrelated to the problem being solved, and > > including them might introduce some unsoundness (not that I have > > checked if they do). > > > > Mathematically I imagine the computation graph between register > > accesses/computations r_1,...,r_n and memory accesses m_1,...m_k that > > has an unlabeled edge m_i -> r_j when m_i loads some data that is an > > input to the access/computation r_j, similarly it has an unlabeled > > edge r_i -> r_j when the result of r_i is used as an input of r_j, > > and finally r_i ->data/ctrl/addr m_j when the value computed by r_j is > > the address/data or affects the ctrl of m_j. So for example, if 'int > > a' were to declare a register, then > > int a = READ_ONCE(&x); > > if (a == 5) { WRITE_ONCE(&y,5); } > > would have something like (*) 4 events (*= since I'm avoiding fully > > formalizing the graph model here I don't really define to which extent > > one splits up register reads, computations, etc., so I'll do it > > somewhat arbitrarily) > > m_1 = READ_ONCE(&x) > > r_1 = store the result of m_1 in a > > r_2 = compare the content of a to 5 > > m_2 = WRITE_ONCE(&y, 5) > > > > with the edges m_1 -> r_1 -> r_2 ->ctrl m_2 > > > > Then in the LKMM graph, we would have m_i ->ctrl m_j (or data or addr) > > iff there is a path m_i -> r_x1 -> ... -> r_xl ->ctrl m_j (or data or > > addr). > > So in the example above, m_1 ->ctrl m_2. > > > > Now what we would do is look at what happens if the compiler/a > > tool/me/whatever interprets 'int a' as a memory location. Instead of > > r_1 and r_2 from above, we would have something like > > > > m_1 = READ_ONCE(&x) > > r_3 = the result of m_1 (stored in a CPU register) > > m_3 = a := the r_3 result > > m_4 = load from a > > r_4 = the result of m_4 (stored in a CPU register) > > m_2 = WRITE_ONCE(&y, 5) > > > > with m_1 -> r_3 ->data -> m_3 ->rfi m_4 -> r_4 ->ctrl m_2 > > and hence > > m_1 ->data m_3 ->rfi m_4 ->ctrl m_2 > > > > What the patch does is make sure that even under this interpretation, > > we still have m_1 ->ctrl m_2 > > Note that this ->data ->rfi applies in every case where something is > > considered a register is turned into a memory location, because those > > cases always introduce a store with a fixed address (that memory > > location) where the data is the current content of the register, which > > is then read (internally) at the next time that data is picked up. A > > store to register never becomes a ctrl or addr edge, hence they are > > not included in the patch. > > > > Let me see if I can describe your approach in a more formal way (look at > me, pretending to know formal methods ;-)) > > Bascially, what you are saying is that for every valid litmus test with > dependencies (carried by local variables i.e. registers), there exists a > way to rewrite the litmus test by treating all (or any number of) local > variables as memory locations. > > Lets say the set of the litmus tests to start with is L and the set of > the rewritten ones is L'is. Here is a rewrite rule I come up with: > > 1) for every local named <n> on processor <P>, add an extra memory > location (maybe named <n>_<P>) in the processor function, for > example > > P0(int *x, int *y) { > int a ...; > } > > became > P0(int *x, int *y, int *a_P0) { > } > > 2) for each <n> in 1), for each assignment of <n>, say: > > int <n> = <expr>; // H > > or > > <n> = <expr>; // H > > rewrite it to > > int computer_register_<seq> = <expr>; // A > *<n>_<P> = computer_register_<seq>; // B > > where <seq> is a self increasing counter that increases every > step or the rewrite. > > This step introduces an extra edge A ->data B. > > 3) for each <n> in 1), for each the read of <n>, say: > > Expr; // T > > rewrite it to > > int computer_register_<seq> = *<n>_<P>; // C > Expr[<n>/computer_register_<seq>]; // D > > where <seq> is a self increasing counter that increases every > step or the rewrite. > > "M[x/y]" means changing the expression by replace all appearance > of variable x into y, e.g. (n1 = n + 1)[n/y] is n1 = y + 1. > > Note that for every rewrite 3), there must exists a > corresponding rewrite 2), that C reads the value stored by the > B of the rewrite 2). This is because the litmus tests in set L > are valid, all variables must be assigned a value before used. > > This step introduces an extra edge B ->rfi C, and also for every > dependency between H -> T in the old litmus test, it preserves > between C -> D in the new litmus test. > > 4) for each <n> and <P> in 1), for each expression in the exist > clause, say: > > Expr > > rewrite it to > > Expr[<P>:<n>/<n>_<P>] > > For example, by these rules, the following litmus test: > > P0(int *x, int *y) { > int r = READ_ONCE(*x); > WRITE_ONCE(*y, r); > } > exists(0:r = 1) > > is rewritten into > > P0(int *x, int *y, int *r_P0) { > int computer_register_0 = READ_ONCE(*x); // by 2) > *r_P0 = computer_register_0; // by 2) > int computer_register_1 = *r_P0; // by 3) > WRITE_ONCE(*y, computer_register_1); // by 3) > } > exists(r_P0 = 1) // by 4) > > > It's obvious that the rewritten litmus tests are valid, and for every > dependency > > H ->dep T in litmus tests of set L > > there must exists a > > A ->data ->rfi ->dep D in L' > > ("dep" is not the same as in linux-kernel.cat). > > And note since L' is a set of valid litmus tests, we can do another > whole rewrite to L' and get L'' which will contains > ->data->rfi->data->rfi->dep, and this covers the (data;rfi)* ;-) > > > Now the hard part, since the rewrite is only one-direction, i.e L => L', > or more preciselly the transverse closure of the rewrite is > one-direction. We can only prove that if there exists a valid litmus > test with dependency ->dep, we can construct a number of litmus tests > with (->data ->rfi)*->dep. > > But to accept the new rules in your patch, we need the opposite > direction, that is, if there exists a (->data ->rfi)* ->dep, we can > somehow find a litmus test that preserves anything else but reduce > > (->data->rfi)*->dep > into > > ->dep. > > (I'm not sure the following is sound, since some more work is needed) > > The first observation is that we can ignore ->data [Marked] ->rfi, > because if we can reduce > > (->data [Plain] ->rfi)* ->data [Marked] ->rfi -> (->data [Plain] ->rfi)* ->dep > > to > > ->data [Marked] ->rfi ->dep > > we get "hb", which already has the ordering we want. > > Now we can focus on ->data [Plain] ->rfi ->dep, e.g. > > x = READ_ONCE(..); // A > *<N> = <expr>; // B, contains x > r = Expr; // C, Expr contains *<N> > WRITE_ONCE(.., r); // D > > We need to assume that the litmus tests are data-race free, since we > only care about valid litmus tests, then it's OK. The rewritten looks > like the following: > > int computer_register_<seq>; > x = READ_ONCE(..); // A > computer_register = <expr>; // B, contains x > // ^^^ basically reverse rewrite of the 2) above > > r = Expr[*<N>/computer_register_<seq>]; // C > // ^^^ basically reverse rewrite of the 3) above > > *<N> = computer_register_<seq>; // R > // ^^^ need this to keep the global memory effect > > WRITE_ONCE(.., r); // D > > We need the data-race free assumption to replace a memory cell into a > local variable. > > By this rewrite, We reduce > A ->data B ->rfi C ->dep D > into > A ->dep D. > > A few details are missing here, for example we may have multip Cs and > need to rewrite all of them, and we need normalization work like > converting litmus tests into canonical forms (e.g change the += into > temporary variables and assignment). Also we need to prove that no more > edges (or edges we care) are added. But these look obvious to me, and > enough formally digging for me today ;-) > > As a result, the rewrite operation on L is proved to be isomorphic! ;-) > In the sense that we think reducing to (->data [Marked] ->rfi)* ->dep > is good enough ;-) > > Now for every litmus test with > > (->data ->rfi)* ->dep > > we can rewrite it into another litmus test preserving all other edges > except replacing the above with > > (->data [Marked] ->rfi)* ->dep > > and the rewrite only contains replacing non-data-race accesses with > local variables and in the sense of C standards and maybe other model > tools these litmus tests are equivalent. > > > Please let me know if this is convincing and clear. If so we could > > link to your e-mail and my response to provide context, without having > > to put a huge explanation into the commit message. If that's not > > desirable I can also think about how to compress this into a shorter > > explanation. > > > > My intution now is the rewrite explanation above, which is good enough > for me but not sure for other audience. Thank you for keep explaining > this to me ;-) > > > Regards, > Boqun > Paul, I think Jonas and I reach out to some degree of argeement on more details of this change, at least I learned a lot ;-) Could you add the following lines in the commit log if you haven't send a PR? More deep details can be found at the LKML discussion[1] [1]: https://lore.kernel.org/lkml/d86295788ad14a02874ab030ddb8a6f8@huawei.com/ I think Jonas' email covers most of the details behind the change and can server as a "Quick Quiz" for the readers of the commit log ;-) Thoughts? Regards, Boqun > > Best wishes, > > Jonas > > > > PS: > > > > > sometimes I'm just slow to understand things ;-) > > > > Well, welcome in the club :D
On Tue, Dec 06, 2022 at 12:52:47PM -0800, Boqun Feng wrote: > On Tue, Dec 06, 2022 at 12:46:58PM -0800, Boqun Feng wrote: > > Thank you, Alan! One question though, can a "smart" compiler optimize > > out the case below, with the same logic? > > > > void P0(int *x, int *y, int *a) > > { > > int r1, r2; > > > > r1 = READ_ONCE(*x); // A > > > > *a = r1 & 0xffff; // B > > > > r2 = *a & 0xffff0000; // C > > > > WRITE_ONCE(*y, r2); // D > > > > } > > > > I think we have A ->data B ->rfi C ->data D, however a "smart" compiler > > can figure out that r2 is actually zero, right? And the code get > > optimized to: > > > > r1 = READ_ONCE(*x); > > r2 = 0; > > WRITE_ONCE(*y, r2); > > *a = r1 & 0xffff; > > > > and break the dependency. Yes, that could happen. > > I know that our memory model is actually unware of the differences of > > syntatics dependencies vs semantics syntatics, so one may argue that in > > the (data; rfi) example above the compiler optimization is outside the > > scope of LKMM, but won't the same reasoning apply to the (addr; rfi) > > example from you? The WRITE_ONCE() _syntatically_ depends on load of > > a[r1], therefore even a "smart" compiler can figure out the value, LKMM > > I guess it should be that r2 (i.e. the load of a[r1]) _syntatically_ > depends on the value of r1. Yes. But more to the point, the LKMM already has this problem for ordinary dependencies. If you do: r1 = READ_ONCE(*x); r2 = r1 & 0x0000ffff; r3 = r2 & 0xffff0000; WRITE_ONCE(*y, r3); then the LKMM will think there is a dependency (because there is a _syntactic_ dependency), but the compiler is likely to realize that there isn't a _semantic_ dependency and will destroy the ordering. We warn people about this possibility, and the same warning applies to dependencies carried by plain accesses. So I don't think this is a reason to object to Jonas's carries-dep relation. Alan
On Thu, Dec 08, 2022 at 01:06:20PM -0800, Boqun Feng wrote: > On Tue, Dec 06, 2022 at 05:43:49PM -0800, Boqun Feng wrote: > > On Mon, Dec 05, 2022 at 01:42:46PM +0000, Jonas Oberhauser wrote: > > [...] > > > > Besides, could you also explain a little bit why only "data;rfi" can > > > > be "carry-dep" but "ctrl;rfi" and "addr;rfi" cannot? I think it's > > > > because there are special cases when compilers can figure out a > > > > condition being true or an address being constant therefore break > > > > the dependency > > > > > > Oh, good question. A bit hard for me to write down the answer clearly > > > (which some people will claim that I don't understand it well myself, > > > but I beg to differ :) :( :) ). > > > > > > In a nutshell, it's because x ->data [Plain] ->rfi y ->... z > > > fulfils the same role as storing something in a register and then > > > using it in a subsequent computation; x ->ctrl y ->... z (and ->addr) > > > don't. So it's not due to smart compilers, just the fact that the > > > other two cases seem unrelated to the problem being solved, and > > > including them might introduce some unsoundness (not that I have > > > checked if they do). > > > > > > Mathematically I imagine the computation graph between register > > > accesses/computations r_1,...,r_n and memory accesses m_1,...m_k that > > > has an unlabeled edge m_i -> r_j when m_i loads some data that is an > > > input to the access/computation r_j, similarly it has an unlabeled > > > edge r_i -> r_j when the result of r_i is used as an input of r_j, > > > and finally r_i ->data/ctrl/addr m_j when the value computed by r_j is > > > the address/data or affects the ctrl of m_j. So for example, if 'int > > > a' were to declare a register, then > > > int a = READ_ONCE(&x); > > > if (a == 5) { WRITE_ONCE(&y,5); } > > > would have something like (*) 4 events (*= since I'm avoiding fully > > > formalizing the graph model here I don't really define to which extent > > > one splits up register reads, computations, etc., so I'll do it > > > somewhat arbitrarily) > > > m_1 = READ_ONCE(&x) > > > r_1 = store the result of m_1 in a > > > r_2 = compare the content of a to 5 > > > m_2 = WRITE_ONCE(&y, 5) > > > > > > with the edges m_1 -> r_1 -> r_2 ->ctrl m_2 > > > > > > Then in the LKMM graph, we would have m_i ->ctrl m_j (or data or addr) > > > iff there is a path m_i -> r_x1 -> ... -> r_xl ->ctrl m_j (or data or > > > addr). > > > So in the example above, m_1 ->ctrl m_2. > > > > > > Now what we would do is look at what happens if the compiler/a > > > tool/me/whatever interprets 'int a' as a memory location. Instead of > > > r_1 and r_2 from above, we would have something like > > > > > > m_1 = READ_ONCE(&x) > > > r_3 = the result of m_1 (stored in a CPU register) > > > m_3 = a := the r_3 result > > > m_4 = load from a > > > r_4 = the result of m_4 (stored in a CPU register) > > > m_2 = WRITE_ONCE(&y, 5) > > > > > > with m_1 -> r_3 ->data -> m_3 ->rfi m_4 -> r_4 ->ctrl m_2 > > > and hence > > > m_1 ->data m_3 ->rfi m_4 ->ctrl m_2 > > > > > > What the patch does is make sure that even under this interpretation, > > > we still have m_1 ->ctrl m_2 > > > Note that this ->data ->rfi applies in every case where something is > > > considered a register is turned into a memory location, because those > > > cases always introduce a store with a fixed address (that memory > > > location) where the data is the current content of the register, which > > > is then read (internally) at the next time that data is picked up. A > > > store to register never becomes a ctrl or addr edge, hence they are > > > not included in the patch. > > > > > > > Let me see if I can describe your approach in a more formal way (look at > > me, pretending to know formal methods ;-)) > > > > Bascially, what you are saying is that for every valid litmus test with > > dependencies (carried by local variables i.e. registers), there exists a > > way to rewrite the litmus test by treating all (or any number of) local > > variables as memory locations. > > > > Lets say the set of the litmus tests to start with is L and the set of > > the rewritten ones is L'is. Here is a rewrite rule I come up with: > > > > 1) for every local named <n> on processor <P>, add an extra memory > > location (maybe named <n>_<P>) in the processor function, for > > example > > > > P0(int *x, int *y) { > > int a ...; > > } > > > > became > > P0(int *x, int *y, int *a_P0) { > > } > > > > 2) for each <n> in 1), for each assignment of <n>, say: > > > > int <n> = <expr>; // H > > > > or > > > > <n> = <expr>; // H > > > > rewrite it to > > > > int computer_register_<seq> = <expr>; // A > > *<n>_<P> = computer_register_<seq>; // B > > > > where <seq> is a self increasing counter that increases every > > step or the rewrite. > > > > This step introduces an extra edge A ->data B. > > > > 3) for each <n> in 1), for each the read of <n>, say: > > > > Expr; // T > > > > rewrite it to > > > > int computer_register_<seq> = *<n>_<P>; // C > > Expr[<n>/computer_register_<seq>]; // D > > > > where <seq> is a self increasing counter that increases every > > step or the rewrite. > > > > "M[x/y]" means changing the expression by replace all appearance > > of variable x into y, e.g. (n1 = n + 1)[n/y] is n1 = y + 1. > > > > Note that for every rewrite 3), there must exists a > > corresponding rewrite 2), that C reads the value stored by the > > B of the rewrite 2). This is because the litmus tests in set L > > are valid, all variables must be assigned a value before used. > > > > This step introduces an extra edge B ->rfi C, and also for every > > dependency between H -> T in the old litmus test, it preserves > > between C -> D in the new litmus test. > > > > 4) for each <n> and <P> in 1), for each expression in the exist > > clause, say: > > > > Expr > > > > rewrite it to > > > > Expr[<P>:<n>/<n>_<P>] > > > > For example, by these rules, the following litmus test: > > > > P0(int *x, int *y) { > > int r = READ_ONCE(*x); > > WRITE_ONCE(*y, r); > > } > > exists(0:r = 1) > > > > is rewritten into > > > > P0(int *x, int *y, int *r_P0) { > > int computer_register_0 = READ_ONCE(*x); // by 2) > > *r_P0 = computer_register_0; // by 2) > > int computer_register_1 = *r_P0; // by 3) > > WRITE_ONCE(*y, computer_register_1); // by 3) > > } > > exists(r_P0 = 1) // by 4) > > > > > > It's obvious that the rewritten litmus tests are valid, and for every > > dependency > > > > H ->dep T in litmus tests of set L > > > > there must exists a > > > > A ->data ->rfi ->dep D in L' > > > > ("dep" is not the same as in linux-kernel.cat). > > > > And note since L' is a set of valid litmus tests, we can do another > > whole rewrite to L' and get L'' which will contains > > ->data->rfi->data->rfi->dep, and this covers the (data;rfi)* ;-) > > > > > > Now the hard part, since the rewrite is only one-direction, i.e L => L', > > or more preciselly the transverse closure of the rewrite is > > one-direction. We can only prove that if there exists a valid litmus > > test with dependency ->dep, we can construct a number of litmus tests > > with (->data ->rfi)*->dep. > > > > But to accept the new rules in your patch, we need the opposite > > direction, that is, if there exists a (->data ->rfi)* ->dep, we can > > somehow find a litmus test that preserves anything else but reduce > > > > (->data->rfi)*->dep > > into > > > > ->dep. > > > > (I'm not sure the following is sound, since some more work is needed) > > > > The first observation is that we can ignore ->data [Marked] ->rfi, > > because if we can reduce > > > > (->data [Plain] ->rfi)* ->data [Marked] ->rfi -> (->data [Plain] ->rfi)* ->dep > > > > to > > > > ->data [Marked] ->rfi ->dep > > > > we get "hb", which already has the ordering we want. > > > > Now we can focus on ->data [Plain] ->rfi ->dep, e.g. > > > > x = READ_ONCE(..); // A > > *<N> = <expr>; // B, contains x > > r = Expr; // C, Expr contains *<N> > > WRITE_ONCE(.., r); // D > > > > We need to assume that the litmus tests are data-race free, since we > > only care about valid litmus tests, then it's OK. The rewritten looks > > like the following: > > > > int computer_register_<seq>; > > x = READ_ONCE(..); // A > > computer_register = <expr>; // B, contains x > > // ^^^ basically reverse rewrite of the 2) above > > > > r = Expr[*<N>/computer_register_<seq>]; // C > > // ^^^ basically reverse rewrite of the 3) above > > > > *<N> = computer_register_<seq>; // R > > // ^^^ need this to keep the global memory effect > > > > WRITE_ONCE(.., r); // D > > > > We need the data-race free assumption to replace a memory cell into a > > local variable. > > > > By this rewrite, We reduce > > A ->data B ->rfi C ->dep D > > into > > A ->dep D. > > > > A few details are missing here, for example we may have multip Cs and > > need to rewrite all of them, and we need normalization work like > > converting litmus tests into canonical forms (e.g change the += into > > temporary variables and assignment). Also we need to prove that no more > > edges (or edges we care) are added. But these look obvious to me, and > > enough formally digging for me today ;-) > > > > As a result, the rewrite operation on L is proved to be isomorphic! ;-) > > In the sense that we think reducing to (->data [Marked] ->rfi)* ->dep > > is good enough ;-) > > > > Now for every litmus test with > > > > (->data ->rfi)* ->dep > > > > we can rewrite it into another litmus test preserving all other edges > > except replacing the above with > > > > (->data [Marked] ->rfi)* ->dep > > > > and the rewrite only contains replacing non-data-race accesses with > > local variables and in the sense of C standards and maybe other model > > tools these litmus tests are equivalent. > > > > > Please let me know if this is convincing and clear. If so we could > > > link to your e-mail and my response to provide context, without having > > > to put a huge explanation into the commit message. If that's not > > > desirable I can also think about how to compress this into a shorter > > > explanation. > > > > > > > My intution now is the rewrite explanation above, which is good enough > > for me but not sure for other audience. Thank you for keep explaining > > this to me ;-) > > > > > > Regards, > > Boqun > > > > Paul, I think Jonas and I reach out to some degree of argeement on more > details of this change, at least I learned a lot ;-) > > Could you add the following lines in the commit log if you haven't send > a PR? > > More deep details can be found at the LKML discussion[1] > > [1]: https://lore.kernel.org/lkml/d86295788ad14a02874ab030ddb8a6f8@huawei.com/ > > I think Jonas' email covers most of the details behind the change and > can server as a "Quick Quiz" for the readers of the commit log ;-) > > Thoughts? I am not so sure that any such quiz would be quick, but life is often like that for memory models. ;-) Thank you all, and unless there are further objections, I will update this on the next rebase. Thanx, Paul > Regards, > Boqun > > > > Best wishes, > > > Jonas > > > > > > PS: > > > > > > > sometimes I'm just slow to understand things ;-) > > > > > > Well, welcome in the club :D
diff --git a/tools/memory-model/Documentation/explanation.txt b/tools/memory-model/Documentation/explanation.txt index e901b47236c3..8e7085238470 100644 --- a/tools/memory-model/Documentation/explanation.txt +++ b/tools/memory-model/Documentation/explanation.txt @@ -2575,7 +2575,7 @@ smp_store_release() -- which is basically how the Linux kernel treats them. Although we said that plain accesses are not linked by the ppo -relation, they do contribute to it indirectly. Namely, when there is +relation, they do contribute to it indirectly. Firstly, when there is an address dependency from a marked load R to a plain store W, followed by smp_wmb() and then a marked store W', the LKMM creates a ppo link from R to W'. The reasoning behind this is perhaps a little @@ -2584,6 +2584,13 @@ for this source code in which W' could execute before R. Just as with pre-bounding by address dependencies, it is possible for the compiler to undermine this relation if sufficient care is not taken. +Secondly, plain accesses can carry dependencies: If a data dependency +links a marked load R to a store W, and the store is read by a load R' +from the same thread, then the data loaded by R' depends on the data +loaded originally by R. Thus, if R' is linked to any access X by a +dependency, R is also linked to access X by the same dependency, even +if W' or R' (or both!) are plain. + There are a few oddball fences which need special treatment: smp_mb__before_atomic(), smp_mb__after_atomic(), and smp_mb__after_spinlock(). The LKMM uses fence events with special diff --git a/tools/memory-model/linux-kernel.bell b/tools/memory-model/linux-kernel.bell index 65c32ca9d5ea..5f0b98c1ab81 100644 --- a/tools/memory-model/linux-kernel.bell +++ b/tools/memory-model/linux-kernel.bell @@ -76,3 +76,9 @@ flag ~empty different-values(srcu-rscs) as srcu-bad-nesting let Marked = (~M) | IW | Once | Release | Acquire | domain(rmw) | range(rmw) | LKR | LKW | UL | LF | RL | RU let Plain = M \ Marked + +(* Redefine dependencies to include those carried through plain accesses *) +let carry-dep = (data ; rfi)* +let addr = carry-dep ; addr +let ctrl = carry-dep ; ctrl +let data = carry-dep ; data diff --git a/tools/memory-model/litmus-tests/dep+plain.litmus b/tools/memory-model/litmus-tests/dep+plain.litmus new file mode 100644 index 000000000000..ebf84daa9a59 --- /dev/null +++ b/tools/memory-model/litmus-tests/dep+plain.litmus @@ -0,0 +1,31 @@ +C dep+plain + +(* + * Result: Never + * + * This litmus test demonstrates that in LKMM, plain accesses + * carry dependencies much like accesses to registers: + * The data stored to *z1 and *z2 by P0() originates from P0()'s + * READ_ONCE(), and therefore using that data to compute the + * conditional of P0()'s if-statement creates a control dependency + * from that READ_ONCE() to P0()'s WRITE_ONCE(). + *) + +{} + +P0(int *x, int *y, int *z1, int *z2) +{ + int a = READ_ONCE(*x); + *z1 = a; + *z2 = *z1; + if (*z2 == 1) + WRITE_ONCE(*y, 1); +} + +P1(int *x, int *y) +{ + int r = smp_load_acquire(y); + smp_store_release(x, r); +} + +exists (x=1 /\ y=1)