Message ID | 20231231082955.16516-4-guoren@kernel.org |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-13654-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:6f82:b0:100:9c79:88ff with SMTP id tb2csp3418740dyb; Sun, 31 Dec 2023 00:31:11 -0800 (PST) X-Google-Smtp-Source: AGHT+IHThCln5rFuluW4oSSnSkYHaJMCYhrcSgAk+JFCK/9lCr6wYcgu6829LTXsKRWRaiTN/2uW X-Received: by 2002:a05:6e02:1c28:b0:35f:7532:59d5 with SMTP id m8-20020a056e021c2800b0035f753259d5mr22560826ilh.31.1704011470860; Sun, 31 Dec 2023 00:31:10 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1704011470; cv=none; d=google.com; s=arc-20160816; b=Fw2mg3EkKtEH+Rwhlz4kcEplDTTtB0Hg2TfTrgt/geGXF6Ih4NqohuJJftnNGclopB tt1nWd5XVdEFCy3oMAGz6NEhG39f+FecVFIJ+qlCrER9UPsFbFFF+KacFpwUfiQAtv8e DSFJlGlFn0C9B7x69l0EEy0Pa4W9/MQZFrpuC/EpnV2YdxWQ/7gJtOICmgeeKKMtkMgD cSVB3ZT3X3QACRvppx5JqzMdxPNP/Q6TAtNXV6jtKSyFjfuwS0KQ3ggwJ3iXOiy1Fn+r 09A7uLk56mssWF4UtcdkIXUq4eY/19BtTsinClJLy7Rfr2OF5V1u969sVlsKiUdKMltA tq9A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=xGexFNeU+RSEavUfPJ6h7cdkTTwkqRt9Q3zroa8aWGY=; fh=8az2WX9seVnlqIwzdvRsfDVjT91R5QAYA+cOijWi0GY=; b=jsZDlc5FzF2t6aQXUFpeK76G+9E3RkCM76T7XVRLjtHND7UDxYchuw6oREe6s6C3gq wdj+gfzf5N/D4DjfP0+wUbh1AyWLnip4zh9wGDdvlQ45f5di7QsgWE0PAyv5HYhveVUK ciTfgTLIB/ptvPH4wwPSAgBr2hZ5tDpQGxL46Cav9vIB/9Dw4vuQAobC6wgrKDWg3pus G3q6ePqoMD3PX6w2iHGxwIIAsMyJ8rkQsaeJsgBOjyrw2IoiZ6swRNDnP/vPRgokwq2X X1I28xIFkR5FGQWTZUv3EpWKePdL6MYLSRmQncMotjc44znXZ+TcbJCtFE9nmGT2/ccq jplg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=l74aSVZn; spf=pass (google.com: domain of linux-kernel+bounces-13654-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-13654-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id u5-20020a17090282c500b001d1db5e39b4si16864544plz.172.2023.12.31.00.31.10 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 31 Dec 2023 00:31:10 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-13654-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=l74aSVZn; spf=pass (google.com: domain of linux-kernel+bounces-13654-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-13654-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 8D6F4282B51 for <ouuuleilei@gmail.com>; Sun, 31 Dec 2023 08:31:10 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id BBAFF11C85; Sun, 31 Dec 2023 08:30:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="l74aSVZn" X-Original-To: linux-kernel@vger.kernel.org Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 31F79F9FA for <linux-kernel@vger.kernel.org>; Sun, 31 Dec 2023 08:30:24 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2E492C433C9; Sun, 31 Dec 2023 08:30:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704011424; bh=pGyWv9VDcODy3haubtquFUnBMC/QK1gpzpez9PuUJg8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=l74aSVZnKCkxsU7iG5U39K6snGA5u0zORTgPF1sZDwcJ0bzyhtgB5uUiyzw0HVVsw ek4vZ6ge7ZUHDNrjVflanVu296zbSASdogcEg8Tb6frJ30w/0BZHKH5UJQoV7dXJ97 GeKq2Ox/1nR8xHE2+sEscfLFV5ut83FCw7ogbacrRHQq/Sa6Yp0wMU31UFkt+TVHfO u7t5+I8bIvXQjLwJOkgzlanYNs2WAr6AaUEXlmLudcTrbCGUW57Je2lm1kCJ8R/N/F 6bJ5FuvlLyeRYYASv8mvFG0yR+qcAblM9nRS23ll9s/RHLDqOvR9fgn3rkJwG5sq7P uFhh6mMktlIiQ== From: guoren@kernel.org To: paul.walmsley@sifive.com, palmer@dabbelt.com, guoren@kernel.org, panqinglin2020@iscas.ac.cn, bjorn@rivosinc.com, conor.dooley@microchip.com, leobras@redhat.com, peterz@infradead.org, keescook@chromium.org, wuwei2016@iscas.ac.cn, xiaoguang.xing@sophgo.com, chao.wei@sophgo.com, unicorn_wang@outlook.com, uwu@icenowy.me, jszhang@kernel.org, wefu@redhat.com, atishp@atishpatra.org, ajones@ventanamicro.com Cc: linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, Guo Ren <guoren@linux.alibaba.com> Subject: [PATCH V2 3/3] riscv: xchg: Prefetch the destination word for sc.w Date: Sun, 31 Dec 2023 03:29:53 -0500 Message-Id: <20231231082955.16516-4-guoren@kernel.org> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20231231082955.16516-1-guoren@kernel.org> References: <20231231082955.16516-1-guoren@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1786785531953146404 X-GMAIL-MSGID: 1786785531953146404 |
Series |
riscv: Add Zicbop & prefetchw support
|
|
Commit Message
Guo Ren
Dec. 31, 2023, 8:29 a.m. UTC
From: Guo Ren <guoren@linux.alibaba.com> The cost of changing a cacheline from shared to exclusive state can be significant, especially when this is triggered by an exclusive store, since it may result in having to retry the transaction. This patch makes use of prefetch.w to prefetch cachelines for write prior to lr/sc loops when using the xchg_small atomic routine. This patch is inspired by commit: 0ea366f5e1b6 ("arm64: atomics: prefetch the destination word for write prior to stxr"). Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Signed-off-by: Guo Ren <guoren@kernel.org> --- arch/riscv/include/asm/cmpxchg.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
Comments
On Sun, Dec 31, 2023 at 03:29:53AM -0500, guoren@kernel.org wrote: > From: Guo Ren <guoren@linux.alibaba.com> > > The cost of changing a cacheline from shared to exclusive state can be > significant, especially when this is triggered by an exclusive store, > since it may result in having to retry the transaction. > > This patch makes use of prefetch.w to prefetch cachelines for write > prior to lr/sc loops when using the xchg_small atomic routine. > > This patch is inspired by commit: 0ea366f5e1b6 ("arm64: atomics: > prefetch the destination word for write prior to stxr"). > > Signed-off-by: Guo Ren <guoren@linux.alibaba.com> > Signed-off-by: Guo Ren <guoren@kernel.org> > --- > arch/riscv/include/asm/cmpxchg.h | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h > index 26cea2395aae..d7b9d7951f08 100644 > --- a/arch/riscv/include/asm/cmpxchg.h > +++ b/arch/riscv/include/asm/cmpxchg.h > @@ -10,6 +10,7 @@ > > #include <asm/barrier.h> > #include <asm/fence.h> > +#include <asm/processor.h> > > #define __arch_xchg_masked(prepend, append, r, p, n) \ Are you sure this is based on v6.7-rc7? Because I don't see this macro. > ({ \ > @@ -23,6 +24,7 @@ > \ > __asm__ __volatile__ ( \ > prepend \ > + PREFETCHW_ASM(%5) \ > "0: lr.w %0, %2\n" \ > " and %1, %0, %z4\n" \ > " or %1, %1, %z3\n" \ > @@ -30,7 +32,7 @@ > " bnez %1, 0b\n" \ > append \ > : "=&r" (__retx), "=&r" (__rc), "+A" (*(__ptr32b)) \ > - : "rJ" (__newx), "rJ" (~__mask) \ > + : "rJ" (__newx), "rJ" (~__mask), "rJ" (__ptr32b) \ I'm pretty sure we don't want to allow the J constraint for __ptr32b. > : "memory"); \ > \ > r = (__typeof__(*(p)))((__retx & __mask) >> __s); \ > -- > 2.40.1 > Thanks, drew
On Tue, Jan 2, 2024 at 7:19 PM Andrew Jones <ajones@ventanamicro.com> wrote: > > On Sun, Dec 31, 2023 at 03:29:53AM -0500, guoren@kernel.org wrote: > > From: Guo Ren <guoren@linux.alibaba.com> > > > > The cost of changing a cacheline from shared to exclusive state can be > > significant, especially when this is triggered by an exclusive store, > > since it may result in having to retry the transaction. > > > > This patch makes use of prefetch.w to prefetch cachelines for write > > prior to lr/sc loops when using the xchg_small atomic routine. > > > > This patch is inspired by commit: 0ea366f5e1b6 ("arm64: atomics: > > prefetch the destination word for write prior to stxr"). > > > > Signed-off-by: Guo Ren <guoren@linux.alibaba.com> > > Signed-off-by: Guo Ren <guoren@kernel.org> > > --- > > arch/riscv/include/asm/cmpxchg.h | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h > > index 26cea2395aae..d7b9d7951f08 100644 > > --- a/arch/riscv/include/asm/cmpxchg.h > > +++ b/arch/riscv/include/asm/cmpxchg.h > > @@ -10,6 +10,7 @@ > > > > #include <asm/barrier.h> > > #include <asm/fence.h> > > +#include <asm/processor.h> > > > > #define __arch_xchg_masked(prepend, append, r, p, n) \ > > Are you sure this is based on v6.7-rc7? Because I don't see this macro. Oh, it is based on Leobras' patches. I would remove it in the next of version. > > > ({ \ > > @@ -23,6 +24,7 @@ > > \ > > __asm__ __volatile__ ( \ > > prepend \ > > + PREFETCHW_ASM(%5) \ > > "0: lr.w %0, %2\n" \ > > " and %1, %0, %z4\n" \ > > " or %1, %1, %z3\n" \ > > @@ -30,7 +32,7 @@ > > " bnez %1, 0b\n" \ > > append \ > > : "=&r" (__retx), "=&r" (__rc), "+A" (*(__ptr32b)) \ > > - : "rJ" (__newx), "rJ" (~__mask) \ > > + : "rJ" (__newx), "rJ" (~__mask), "rJ" (__ptr32b) \ > > I'm pretty sure we don't want to allow the J constraint for __ptr32b. > > > : "memory"); \ > > \ > > r = (__typeof__(*(p)))((__retx & __mask) >> __s); \ > > -- > > 2.40.1 > > > > Thanks, > drew
On Wed, Jan 03, 2024 at 02:15:45PM +0800, Guo Ren wrote: > On Tue, Jan 2, 2024 at 7:19 PM Andrew Jones <ajones@ventanamicro.com> wrote: > > > > On Sun, Dec 31, 2023 at 03:29:53AM -0500, guoren@kernel.org wrote: > > > From: Guo Ren <guoren@linux.alibaba.com> > > > > > > The cost of changing a cacheline from shared to exclusive state can be > > > significant, especially when this is triggered by an exclusive store, > > > since it may result in having to retry the transaction. > > > > > > This patch makes use of prefetch.w to prefetch cachelines for write > > > prior to lr/sc loops when using the xchg_small atomic routine. > > > > > > This patch is inspired by commit: 0ea366f5e1b6 ("arm64: atomics: > > > prefetch the destination word for write prior to stxr"). > > > > > > Signed-off-by: Guo Ren <guoren@linux.alibaba.com> > > > Signed-off-by: Guo Ren <guoren@kernel.org> > > > --- > > > arch/riscv/include/asm/cmpxchg.h | 4 +++- > > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > > > diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h > > > index 26cea2395aae..d7b9d7951f08 100644 > > > --- a/arch/riscv/include/asm/cmpxchg.h > > > +++ b/arch/riscv/include/asm/cmpxchg.h > > > @@ -10,6 +10,7 @@ > > > > > > #include <asm/barrier.h> > > > #include <asm/fence.h> > > > +#include <asm/processor.h> > > > > > > #define __arch_xchg_masked(prepend, append, r, p, n) \ > > > > Are you sure this is based on v6.7-rc7? Because I don't see this macro. > Oh, it is based on Leobras' patches. I would remove it in the next of version. I would say this next :) > > > > > > ({ \ > > > @@ -23,6 +24,7 @@ > > > \ > > > __asm__ __volatile__ ( \ > > > prepend \ > > > + PREFETCHW_ASM(%5) \ > > > "0: lr.w %0, %2\n" \ > > > " and %1, %0, %z4\n" \ > > > " or %1, %1, %z3\n" \ > > > @@ -30,7 +32,7 @@ > > > " bnez %1, 0b\n" \ > > > append \ > > > : "=&r" (__retx), "=&r" (__rc), "+A" (*(__ptr32b)) \ > > > - : "rJ" (__newx), "rJ" (~__mask) \ > > > + : "rJ" (__newx), "rJ" (~__mask), "rJ" (__ptr32b) \ > > > > I'm pretty sure we don't want to allow the J constraint for __ptr32b. > > > > > : "memory"); \ > > > \ > > > r = (__typeof__(*(p)))((__retx & __mask) >> __s); \ > > > -- > > > 2.40.1 > > > > > > > Thanks, > > drew > > > > -- > Best Regards > Guo Ren > Nice patch :) Any reason it's not needed in __arch_cmpxchg_masked(), and __arch_cmpxchg() ? Thanks! Leo
On Thu, Jan 4, 2024 at 3:45 AM Leonardo Bras <leobras@redhat.com> wrote: > > On Wed, Jan 03, 2024 at 02:15:45PM +0800, Guo Ren wrote: > > On Tue, Jan 2, 2024 at 7:19 PM Andrew Jones <ajones@ventanamicro.com> wrote: > > > > > > On Sun, Dec 31, 2023 at 03:29:53AM -0500, guoren@kernel.org wrote: > > > > From: Guo Ren <guoren@linux.alibaba.com> > > > > > > > > The cost of changing a cacheline from shared to exclusive state can be > > > > significant, especially when this is triggered by an exclusive store, > > > > since it may result in having to retry the transaction. > > > > > > > > This patch makes use of prefetch.w to prefetch cachelines for write > > > > prior to lr/sc loops when using the xchg_small atomic routine. > > > > > > > > This patch is inspired by commit: 0ea366f5e1b6 ("arm64: atomics: > > > > prefetch the destination word for write prior to stxr"). > > > > > > > > Signed-off-by: Guo Ren <guoren@linux.alibaba.com> > > > > Signed-off-by: Guo Ren <guoren@kernel.org> > > > > --- > > > > arch/riscv/include/asm/cmpxchg.h | 4 +++- > > > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h > > > > index 26cea2395aae..d7b9d7951f08 100644 > > > > --- a/arch/riscv/include/asm/cmpxchg.h > > > > +++ b/arch/riscv/include/asm/cmpxchg.h > > > > @@ -10,6 +10,7 @@ > > > > > > > > #include <asm/barrier.h> > > > > #include <asm/fence.h> > > > > +#include <asm/processor.h> > > > > > > > > #define __arch_xchg_masked(prepend, append, r, p, n) \ > > > > > > Are you sure this is based on v6.7-rc7? Because I don't see this macro. > > Oh, it is based on Leobras' patches. I would remove it in the next of version. > > I would say this next :) Thx for the grammar correction. > > > > > > > > > > ({ \ > > > > @@ -23,6 +24,7 @@ > > > > \ > > > > __asm__ __volatile__ ( \ > > > > prepend \ > > > > + PREFETCHW_ASM(%5) \ > > > > "0: lr.w %0, %2\n" \ > > > > " and %1, %0, %z4\n" \ > > > > " or %1, %1, %z3\n" \ > > > > @@ -30,7 +32,7 @@ > > > > " bnez %1, 0b\n" \ > > > > append \ > > > > : "=&r" (__retx), "=&r" (__rc), "+A" (*(__ptr32b)) \ > > > > - : "rJ" (__newx), "rJ" (~__mask) \ > > > > + : "rJ" (__newx), "rJ" (~__mask), "rJ" (__ptr32b) \ > > > > > > I'm pretty sure we don't want to allow the J constraint for __ptr32b. > > > > > > > : "memory"); \ > > > > \ > > > > r = (__typeof__(*(p)))((__retx & __mask) >> __s); \ > > > > -- > > > > 2.40.1 > > > > > > > > > > Thanks, > > > drew > > > > > > > > -- > > Best Regards > > Guo Ren > > > > Nice patch :) > Any reason it's not needed in __arch_cmpxchg_masked(), and __arch_cmpxchg() ? CAS is a conditional AMO, unlike xchg (Stand AMO). Arm64 is wrong, or they have a problem with the hardware. > > Thanks! > Leo >
On Thu, Jan 04, 2024 at 09:24:40AM +0800, Guo Ren wrote: > On Thu, Jan 4, 2024 at 3:45 AM Leonardo Bras <leobras@redhat.com> wrote: > > > > On Wed, Jan 03, 2024 at 02:15:45PM +0800, Guo Ren wrote: > > > On Tue, Jan 2, 2024 at 7:19 PM Andrew Jones <ajones@ventanamicro.com> wrote: > > > > > > > > On Sun, Dec 31, 2023 at 03:29:53AM -0500, guoren@kernel.org wrote: > > > > > From: Guo Ren <guoren@linux.alibaba.com> > > > > > > > > > > The cost of changing a cacheline from shared to exclusive state can be > > > > > significant, especially when this is triggered by an exclusive store, > > > > > since it may result in having to retry the transaction. > > > > > > > > > > This patch makes use of prefetch.w to prefetch cachelines for write > > > > > prior to lr/sc loops when using the xchg_small atomic routine. > > > > > > > > > > This patch is inspired by commit: 0ea366f5e1b6 ("arm64: atomics: > > > > > prefetch the destination word for write prior to stxr"). > > > > > > > > > > Signed-off-by: Guo Ren <guoren@linux.alibaba.com> > > > > > Signed-off-by: Guo Ren <guoren@kernel.org> > > > > > --- > > > > > arch/riscv/include/asm/cmpxchg.h | 4 +++- > > > > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h > > > > > index 26cea2395aae..d7b9d7951f08 100644 > > > > > --- a/arch/riscv/include/asm/cmpxchg.h > > > > > +++ b/arch/riscv/include/asm/cmpxchg.h > > > > > @@ -10,6 +10,7 @@ > > > > > > > > > > #include <asm/barrier.h> > > > > > #include <asm/fence.h> > > > > > +#include <asm/processor.h> > > > > > > > > > > #define __arch_xchg_masked(prepend, append, r, p, n) \ > > > > > > > > Are you sure this is based on v6.7-rc7? Because I don't see this macro. > > > Oh, it is based on Leobras' patches. I would remove it in the next of version. > > > > I would say this next :) > Thx for the grammar correction. Oh, I was not intending to correct grammar. I just meant the next thing I would mention is that it was based on top of my patchset instead of v6.7-rc7: > > > > > > > > > > > > > > > ({ \ > > > > > @@ -23,6 +24,7 @@ > > > > > \ > > > > > __asm__ __volatile__ ( \ > > > > > prepend \ > > > > > + PREFETCHW_ASM(%5) \ > > > > > "0: lr.w %0, %2\n" \ > > > > > " and %1, %0, %z4\n" \ > > > > > " or %1, %1, %z3\n" \ > > > > > @@ -30,7 +32,7 @@ > > > > > " bnez %1, 0b\n" \ > > > > > append \ > > > > > : "=&r" (__retx), "=&r" (__rc), "+A" (*(__ptr32b)) \ > > > > > - : "rJ" (__newx), "rJ" (~__mask) \ > > > > > + : "rJ" (__newx), "rJ" (~__mask), "rJ" (__ptr32b) \ > > > > > > > > I'm pretty sure we don't want to allow the J constraint for __ptr32b. > > > > > > > > > : "memory"); \ > > > > > \ > > > > > r = (__typeof__(*(p)))((__retx & __mask) >> __s); \ > > > > > -- > > > > > 2.40.1 > > > > > > > > > > > > > Thanks, > > > > drew > > > > > > > > > > > > -- > > > Best Regards > > > Guo Ren > > > > > > > Nice patch :) > > Any reason it's not needed in __arch_cmpxchg_masked(), and __arch_cmpxchg() ? > CAS is a conditional AMO, unlike xchg (Stand AMO). Arm64 is wrong, or > they have a problem with the hardware. Sorry, I was unable to fully understand the reason here. You suggest that the PREFETCH.W was inserted on xchg_masked because it will always switch the variable (no compare, blind CAS), but not on cmpxchg. Is this because cmpxchg will depend on a compare, and thus it does not garantee a write? so it would be unwise to always prefetch cacheline exclusiveness for this cpu, where shared state would be enough. Is that correct? Thanks! Leo > > > > > Thanks! > > Leo > > > > > -- > Best Regards > Guo Ren >
On Thu, Jan 4, 2024 at 11:56 AM Leonardo Bras <leobras@redhat.com> wrote: > > On Thu, Jan 04, 2024 at 09:24:40AM +0800, Guo Ren wrote: > > On Thu, Jan 4, 2024 at 3:45 AM Leonardo Bras <leobras@redhat.com> wrote: > > > > > > On Wed, Jan 03, 2024 at 02:15:45PM +0800, Guo Ren wrote: > > > > On Tue, Jan 2, 2024 at 7:19 PM Andrew Jones <ajones@ventanamicro.com> wrote: > > > > > > > > > > On Sun, Dec 31, 2023 at 03:29:53AM -0500, guoren@kernel.org wrote: > > > > > > From: Guo Ren <guoren@linux.alibaba.com> > > > > > > > > > > > > The cost of changing a cacheline from shared to exclusive state can be > > > > > > significant, especially when this is triggered by an exclusive store, > > > > > > since it may result in having to retry the transaction. > > > > > > > > > > > > This patch makes use of prefetch.w to prefetch cachelines for write > > > > > > prior to lr/sc loops when using the xchg_small atomic routine. > > > > > > > > > > > > This patch is inspired by commit: 0ea366f5e1b6 ("arm64: atomics: > > > > > > prefetch the destination word for write prior to stxr"). > > > > > > > > > > > > Signed-off-by: Guo Ren <guoren@linux.alibaba.com> > > > > > > Signed-off-by: Guo Ren <guoren@kernel.org> > > > > > > --- > > > > > > arch/riscv/include/asm/cmpxchg.h | 4 +++- > > > > > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h > > > > > > index 26cea2395aae..d7b9d7951f08 100644 > > > > > > --- a/arch/riscv/include/asm/cmpxchg.h > > > > > > +++ b/arch/riscv/include/asm/cmpxchg.h > > > > > > @@ -10,6 +10,7 @@ > > > > > > > > > > > > #include <asm/barrier.h> > > > > > > #include <asm/fence.h> > > > > > > +#include <asm/processor.h> > > > > > > > > > > > > #define __arch_xchg_masked(prepend, append, r, p, n) \ > > > > > > > > > > Are you sure this is based on v6.7-rc7? Because I don't see this macro. > > > > Oh, it is based on Leobras' patches. I would remove it in the next of version. > > > > > > I would say this next :) > > Thx for the grammar correction. > > Oh, I was not intending to correct grammar. > I just meant the next thing I would mention is that it was based on top of > my patchset instead of v6.7-rc7: > > > > > > > > > > > > > > > > > > > > > ({ \ > > > > > > @@ -23,6 +24,7 @@ > > > > > > \ > > > > > > __asm__ __volatile__ ( \ > > > > > > prepend \ > > > > > > + PREFETCHW_ASM(%5) \ > > > > > > "0: lr.w %0, %2\n" \ > > > > > > " and %1, %0, %z4\n" \ > > > > > > " or %1, %1, %z3\n" \ > > > > > > @@ -30,7 +32,7 @@ > > > > > > " bnez %1, 0b\n" \ > > > > > > append \ > > > > > > : "=&r" (__retx), "=&r" (__rc), "+A" (*(__ptr32b)) \ > > > > > > - : "rJ" (__newx), "rJ" (~__mask) \ > > > > > > + : "rJ" (__newx), "rJ" (~__mask), "rJ" (__ptr32b) \ > > > > > > > > > > I'm pretty sure we don't want to allow the J constraint for __ptr32b. > > > > > > > > > > > : "memory"); \ > > > > > > \ > > > > > > r = (__typeof__(*(p)))((__retx & __mask) >> __s); \ > > > > > > -- > > > > > > 2.40.1 > > > > > > > > > > > > > > > > Thanks, > > > > > drew > > > > > > > > > > > > > > > > -- > > > > Best Regards > > > > Guo Ren > > > > > > > > > > Nice patch :) > > > Any reason it's not needed in __arch_cmpxchg_masked(), and __arch_cmpxchg() ? > > CAS is a conditional AMO, unlike xchg (Stand AMO). Arm64 is wrong, or > > they have a problem with the hardware. > > Sorry, I was unable to fully understand the reason here. > > You suggest that the PREFETCH.W was inserted on xchg_masked because it will > always switch the variable (no compare, blind CAS), but not on cmpxchg. > > Is this because cmpxchg will depend on a compare, and thus it does not > garantee a write? so it would be unwise to always prefetch cacheline Yes, it has a comparison, so a store may not exist there. > exclusiveness for this cpu, where shared state would be enough. > Is that correct? Yes, exclusiveness would invalidate other harts' cache lines. > > Thanks! > Leo > > > > > > > > > > Thanks! > > > Leo > > > > > > > > > -- > > Best Regards > > Guo Ren > > >
On Thu, Jan 04, 2024 at 04:14:27PM +0800, Guo Ren wrote: > On Thu, Jan 4, 2024 at 11:56 AM Leonardo Bras <leobras@redhat.com> wrote: > > > > On Thu, Jan 04, 2024 at 09:24:40AM +0800, Guo Ren wrote: > > > On Thu, Jan 4, 2024 at 3:45 AM Leonardo Bras <leobras@redhat.com> wrote: > > > > > > > > On Wed, Jan 03, 2024 at 02:15:45PM +0800, Guo Ren wrote: > > > > > On Tue, Jan 2, 2024 at 7:19 PM Andrew Jones <ajones@ventanamicro.com> wrote: > > > > > > > > > > > > On Sun, Dec 31, 2023 at 03:29:53AM -0500, guoren@kernel.org wrote: > > > > > > > From: Guo Ren <guoren@linux.alibaba.com> > > > > > > > > > > > > > > The cost of changing a cacheline from shared to exclusive state can be > > > > > > > significant, especially when this is triggered by an exclusive store, > > > > > > > since it may result in having to retry the transaction. > > > > > > > > > > > > > > This patch makes use of prefetch.w to prefetch cachelines for write > > > > > > > prior to lr/sc loops when using the xchg_small atomic routine. > > > > > > > > > > > > > > This patch is inspired by commit: 0ea366f5e1b6 ("arm64: atomics: > > > > > > > prefetch the destination word for write prior to stxr"). > > > > > > > > > > > > > > Signed-off-by: Guo Ren <guoren@linux.alibaba.com> > > > > > > > Signed-off-by: Guo Ren <guoren@kernel.org> > > > > > > > --- > > > > > > > arch/riscv/include/asm/cmpxchg.h | 4 +++- > > > > > > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > > > > > > > > > > > diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h > > > > > > > index 26cea2395aae..d7b9d7951f08 100644 > > > > > > > --- a/arch/riscv/include/asm/cmpxchg.h > > > > > > > +++ b/arch/riscv/include/asm/cmpxchg.h > > > > > > > @@ -10,6 +10,7 @@ > > > > > > > > > > > > > > #include <asm/barrier.h> > > > > > > > #include <asm/fence.h> > > > > > > > +#include <asm/processor.h> > > > > > > > > > > > > > > #define __arch_xchg_masked(prepend, append, r, p, n) \ > > > > > > > > > > > > Are you sure this is based on v6.7-rc7? Because I don't see this macro. > > > > > Oh, it is based on Leobras' patches. I would remove it in the next of version. > > > > > > > > I would say this next :) > > > Thx for the grammar correction. > > > > Oh, I was not intending to correct grammar. > > I just meant the next thing I would mention is that it was based on top of > > my patchset instead of v6.7-rc7: > > > > > > > > > > > > > > > > > > > > > > > > > > > ({ \ > > > > > > > @@ -23,6 +24,7 @@ > > > > > > > \ > > > > > > > __asm__ __volatile__ ( \ > > > > > > > prepend \ > > > > > > > + PREFETCHW_ASM(%5) \ > > > > > > > "0: lr.w %0, %2\n" \ > > > > > > > " and %1, %0, %z4\n" \ > > > > > > > " or %1, %1, %z3\n" \ > > > > > > > @@ -30,7 +32,7 @@ > > > > > > > " bnez %1, 0b\n" \ > > > > > > > append \ > > > > > > > : "=&r" (__retx), "=&r" (__rc), "+A" (*(__ptr32b)) \ > > > > > > > - : "rJ" (__newx), "rJ" (~__mask) \ > > > > > > > + : "rJ" (__newx), "rJ" (~__mask), "rJ" (__ptr32b) \ > > > > > > > > > > > > I'm pretty sure we don't want to allow the J constraint for __ptr32b. > > > > > > > > > > > > > : "memory"); \ > > > > > > > \ > > > > > > > r = (__typeof__(*(p)))((__retx & __mask) >> __s); \ > > > > > > > -- > > > > > > > 2.40.1 > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > drew > > > > > > > > > > > > > > > > > > > > -- > > > > > Best Regards > > > > > Guo Ren > > > > > > > > > > > > > Nice patch :) > > > > Any reason it's not needed in __arch_cmpxchg_masked(), and __arch_cmpxchg() ? > > > CAS is a conditional AMO, unlike xchg (Stand AMO). Arm64 is wrong, or > > > they have a problem with the hardware. > > > > Sorry, I was unable to fully understand the reason here. > > > > You suggest that the PREFETCH.W was inserted on xchg_masked because it will > > always switch the variable (no compare, blind CAS), but not on cmpxchg. > > > > Is this because cmpxchg will depend on a compare, and thus it does not > > garantee a write? so it would be unwise to always prefetch cacheline > Yes, it has a comparison, so a store may not exist there. > > > exclusiveness for this cpu, where shared state would be enough. > > Is that correct? > Yes, exclusiveness would invalidate other harts' cache lines. I see. I recall a previous discussion on computer arch which stated that any LR would require to get a cacheline in exclusive state for lr/sc to work, but I went through the RISC-V lr/sc documentation and could not find any info about its cacheline behavior. If this stands correct, the PREFETCH.W could be useful before every lr, right? (maybe that's the case for arm64 that you mentioned before) Thanks! Leo > > > > > Thanks! > > Leo > > > > > > > > > > > > > > > Thanks! > > > > Leo > > > > > > > > > > > > > -- > > > Best Regards > > > Guo Ren > > > > > > > > -- > Best Regards > Guo Ren >
On Thu, Jan 4, 2024 at 10:17 PM Leonardo Bras <leobras@redhat.com> wrote: > > On Thu, Jan 04, 2024 at 04:14:27PM +0800, Guo Ren wrote: > > On Thu, Jan 4, 2024 at 11:56 AM Leonardo Bras <leobras@redhat.com> wrote: > > > > > > On Thu, Jan 04, 2024 at 09:24:40AM +0800, Guo Ren wrote: > > > > On Thu, Jan 4, 2024 at 3:45 AM Leonardo Bras <leobras@redhat.com> wrote: > > > > > > > > > > On Wed, Jan 03, 2024 at 02:15:45PM +0800, Guo Ren wrote: > > > > > > On Tue, Jan 2, 2024 at 7:19 PM Andrew Jones <ajones@ventanamicro.com> wrote: > > > > > > > > > > > > > > On Sun, Dec 31, 2023 at 03:29:53AM -0500, guoren@kernel.org wrote: > > > > > > > > From: Guo Ren <guoren@linux.alibaba.com> > > > > > > > > > > > > > > > > The cost of changing a cacheline from shared to exclusive state can be > > > > > > > > significant, especially when this is triggered by an exclusive store, > > > > > > > > since it may result in having to retry the transaction. > > > > > > > > > > > > > > > > This patch makes use of prefetch.w to prefetch cachelines for write > > > > > > > > prior to lr/sc loops when using the xchg_small atomic routine. > > > > > > > > > > > > > > > > This patch is inspired by commit: 0ea366f5e1b6 ("arm64: atomics: > > > > > > > > prefetch the destination word for write prior to stxr"). > > > > > > > > > > > > > > > > Signed-off-by: Guo Ren <guoren@linux.alibaba.com> > > > > > > > > Signed-off-by: Guo Ren <guoren@kernel.org> > > > > > > > > --- > > > > > > > > arch/riscv/include/asm/cmpxchg.h | 4 +++- > > > > > > > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > > > > > > > > > > > > > diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h > > > > > > > > index 26cea2395aae..d7b9d7951f08 100644 > > > > > > > > --- a/arch/riscv/include/asm/cmpxchg.h > > > > > > > > +++ b/arch/riscv/include/asm/cmpxchg.h > > > > > > > > @@ -10,6 +10,7 @@ > > > > > > > > > > > > > > > > #include <asm/barrier.h> > > > > > > > > #include <asm/fence.h> > > > > > > > > +#include <asm/processor.h> > > > > > > > > > > > > > > > > #define __arch_xchg_masked(prepend, append, r, p, n) \ > > > > > > > > > > > > > > Are you sure this is based on v6.7-rc7? Because I don't see this macro. > > > > > > Oh, it is based on Leobras' patches. I would remove it in the next of version. > > > > > > > > > > I would say this next :) > > > > Thx for the grammar correction. > > > > > > Oh, I was not intending to correct grammar. > > > I just meant the next thing I would mention is that it was based on top of > > > my patchset instead of v6.7-rc7: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ({ \ > > > > > > > > @@ -23,6 +24,7 @@ > > > > > > > > \ > > > > > > > > __asm__ __volatile__ ( \ > > > > > > > > prepend \ > > > > > > > > + PREFETCHW_ASM(%5) \ > > > > > > > > "0: lr.w %0, %2\n" \ > > > > > > > > " and %1, %0, %z4\n" \ > > > > > > > > " or %1, %1, %z3\n" \ > > > > > > > > @@ -30,7 +32,7 @@ > > > > > > > > " bnez %1, 0b\n" \ > > > > > > > > append \ > > > > > > > > : "=&r" (__retx), "=&r" (__rc), "+A" (*(__ptr32b)) \ > > > > > > > > - : "rJ" (__newx), "rJ" (~__mask) \ > > > > > > > > + : "rJ" (__newx), "rJ" (~__mask), "rJ" (__ptr32b) \ > > > > > > > > > > > > > > I'm pretty sure we don't want to allow the J constraint for __ptr32b. > > > > > > > > > > > > > > > : "memory"); \ > > > > > > > > \ > > > > > > > > r = (__typeof__(*(p)))((__retx & __mask) >> __s); \ > > > > > > > > -- > > > > > > > > 2.40.1 > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > drew > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Best Regards > > > > > > Guo Ren > > > > > > > > > > > > > > > > Nice patch :) > > > > > Any reason it's not needed in __arch_cmpxchg_masked(), and __arch_cmpxchg() ? > > > > CAS is a conditional AMO, unlike xchg (Stand AMO). Arm64 is wrong, or > > > > they have a problem with the hardware. > > > > > > Sorry, I was unable to fully understand the reason here. > > > > > > You suggest that the PREFETCH.W was inserted on xchg_masked because it will > > > always switch the variable (no compare, blind CAS), but not on cmpxchg. > > > > > > Is this because cmpxchg will depend on a compare, and thus it does not > > > garantee a write? so it would be unwise to always prefetch cacheline > > Yes, it has a comparison, so a store may not exist there. > > > > > exclusiveness for this cpu, where shared state would be enough. > > > Is that correct? > > Yes, exclusiveness would invalidate other harts' cache lines. > > I see. > > I recall a previous discussion on computer arch which stated that any LR > would require to get a cacheline in exclusive state for lr/sc to work, but > I went through the RISC-V lr/sc documentation and could not find any info > about its cacheline behavior. No, lr couldn't get a cacheline in exclusive, that would break the ISA design. Think about "lr + wfe" pair. > > If this stands correct, the PREFETCH.W could be useful before every lr, > right? > (maybe that's the case for arm64 that you mentioned before) The arm64 "lr + sc" cmpxchg version is not good, don't follow that. They are moving to the LSE's cas instruction. > > Thanks! > Leo > > > > > > > > > Thanks! > > > Leo > > > > > > > > > > > > > > > > > > > > Thanks! > > > > > Leo > > > > > > > > > > > > > > > > > -- > > > > Best Regards > > > > Guo Ren > > > > > > > > > > > > > -- > > Best Regards > > Guo Ren > > >
diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h index 26cea2395aae..d7b9d7951f08 100644 --- a/arch/riscv/include/asm/cmpxchg.h +++ b/arch/riscv/include/asm/cmpxchg.h @@ -10,6 +10,7 @@ #include <asm/barrier.h> #include <asm/fence.h> +#include <asm/processor.h> #define __arch_xchg_masked(prepend, append, r, p, n) \ ({ \ @@ -23,6 +24,7 @@ \ __asm__ __volatile__ ( \ prepend \ + PREFETCHW_ASM(%5) \ "0: lr.w %0, %2\n" \ " and %1, %0, %z4\n" \ " or %1, %1, %z3\n" \ @@ -30,7 +32,7 @@ " bnez %1, 0b\n" \ append \ : "=&r" (__retx), "=&r" (__rc), "+A" (*(__ptr32b)) \ - : "rJ" (__newx), "rJ" (~__mask) \ + : "rJ" (__newx), "rJ" (~__mask), "rJ" (__ptr32b) \ : "memory"); \ \ r = (__typeof__(*(p)))((__retx & __mask) >> __s); \