s390: cmpxchg: Make loop condition for 1,2 byte cases precise

Message ID 20221116144711.3811011-1-scgl@linux.ibm.com
State New
Headers
Series s390: cmpxchg: Make loop condition for 1,2 byte cases precise |

Commit Message

Janis Schoetterl-Glausch Nov. 16, 2022, 2:47 p.m. UTC
  The cmpxchg implementation for 1 and 2 bytes consists of a 4 byte
cmpxchg loop. Currently, the decision to retry is imprecise, looping if
bits outside the target byte(s) change instead of retrying until the
target byte(s) differ from the old value.
E.g. if an attempt to exchange (prev_left_0 old_bytes prev_right_0) is
made and it fails because the word at the address is
(prev_left_1 x prev_right_1) where both x != old_bytes and one of the
prev_*_1 values differs from the respective prev_*_0 value, the cmpxchg
is retried, even if by a semantic equivalent to a normal cmpxchg, the
exchange would fail.
Instead exit the loop if x != old_bytes and retry otherwise.

Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
---


Unfortunately the diff got blown up quite a bit, even tho the asm
changes are not that complex. This is mostly because of in arguments
becoming (in)out arguments.

I don't think all the '&' constraints are necessary, but I don't see how
they could affect code generation.
I don't see why we would need the memory clobber, however.

I tested the cmpxchg_user_key changes via the kvm memop selftest that is
part of the KVM cmpxchg memop series.
I looked for an existing way to test the cmpxchg changes, but didn't
find anything.


 arch/s390/include/asm/cmpxchg.h | 60 ++++++++++++++-----------
 arch/s390/include/asm/uaccess.h | 80 ++++++++++++++++++---------------
 2 files changed, 78 insertions(+), 62 deletions(-)


base-commit: b23ddf9d5a30f64a1a51a85f0d9e2553210b21a2
  

Comments

Heiko Carstens Nov. 17, 2022, 6:19 p.m. UTC | #1
On Wed, Nov 16, 2022 at 03:47:11PM +0100, Janis Schoetterl-Glausch wrote:
> The cmpxchg implementation for 1 and 2 bytes consists of a 4 byte
> cmpxchg loop. Currently, the decision to retry is imprecise, looping if
> bits outside the target byte(s) change instead of retrying until the
> target byte(s) differ from the old value.
> E.g. if an attempt to exchange (prev_left_0 old_bytes prev_right_0) is
> made and it fails because the word at the address is
> (prev_left_1 x prev_right_1) where both x != old_bytes and one of the
> prev_*_1 values differs from the respective prev_*_0 value, the cmpxchg
> is retried, even if by a semantic equivalent to a normal cmpxchg, the
> exchange would fail.
> Instead exit the loop if x != old_bytes and retry otherwise.
> 
> Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
> ---
> 
> 
> Unfortunately the diff got blown up quite a bit, even tho the asm
> changes are not that complex. This is mostly because of in arguments
> becoming (in)out arguments.
> 
> I don't think all the '&' constraints are necessary, but I don't see how
> they could affect code generation.

For cmpxchg() it wouldn't make any difference. For cmpxchg_user_key()
it might lead to a small improvement, since the register that is
allocated for the key variable might be reused. But I haven't looked
into that in detail. None of the early clobbers is necessary anymore
after your changes, but let's leave it as it is.

> I don't see why we would need the memory clobber, however.

The memory clobber (aka memory barrier) is necessary because it may be
used to implement e.g. some custom locking. For that it is required
the compiler does reorder read or write accesses behind/before the
inline assembly.

> I tested the cmpxchg_user_key changes via the kvm memop selftest that is
> part of the KVM cmpxchg memop series.
> I looked for an existing way to test the cmpxchg changes, but didn't
> find anything.

Yeah, guess having a test for this would be a nice to have :)

>  arch/s390/include/asm/cmpxchg.h | 60 ++++++++++++++-----------
>  arch/s390/include/asm/uaccess.h | 80 ++++++++++++++++++---------------
>  2 files changed, 78 insertions(+), 62 deletions(-)

The patch looks good - applied to the wip/cmpxchg_user_key branch.
  

Patch

diff --git a/arch/s390/include/asm/cmpxchg.h b/arch/s390/include/asm/cmpxchg.h
index 1c5785b851ec..3f26416c2ad8 100644
--- a/arch/s390/include/asm/cmpxchg.h
+++ b/arch/s390/include/asm/cmpxchg.h
@@ -90,55 +90,63 @@  static __always_inline unsigned long __cmpxchg(unsigned long address,
 {
 	switch (size) {
 	case 1: {
-		unsigned int prev, tmp, shift;
+		unsigned int prev, shift, mask;
 
 		shift = (3 ^ (address & 3)) << 3;
 		address ^= address & 3;
+		old = (old & 0xff) << shift;
+		new = (new & 0xff) << shift;
+		mask = ~(0xff << shift);
 		asm volatile(
 			"	l	%[prev],%[address]\n"
-			"0:	nr	%[prev],%[mask]\n"
-			"	lr	%[tmp],%[prev]\n"
-			"	or	%[prev],%[old]\n"
-			"	or	%[tmp],%[new]\n"
-			"	cs	%[prev],%[tmp],%[address]\n"
+			"	nr	%[prev],%[mask]\n"
+			"	xilf	%[mask],0xffffffff\n"
+			"	or	%[new],%[prev]\n"
+			"	or	%[prev],%[tmp]\n"
+			"0:	lr	%[tmp],%[prev]\n"
+			"	cs	%[prev],%[new],%[address]\n"
 			"	jnl	1f\n"
 			"	xr	%[tmp],%[prev]\n"
+			"	xr	%[new],%[tmp]\n"
 			"	nr	%[tmp],%[mask]\n"
-			"	jnz	0b\n"
+			"	jz	0b\n"
 			"1:"
 			: [prev] "=&d" (prev),
-			  [tmp] "=&d" (tmp),
-			  [address] "+Q" (*(int *)address)
-			: [old] "d" ((old & 0xff) << shift),
-			  [new] "d" ((new & 0xff) << shift),
-			  [mask] "d" (~(0xff << shift))
-			: "memory", "cc");
+			  [address] "+Q" (*(int *)address),
+			  [tmp] "+&d" (old),
+			  [new] "+&d" (new),
+			  [mask] "+&d" (mask)
+			:: "memory", "cc");
 		return prev >> shift;
 	}
 	case 2: {
-		unsigned int prev, tmp, shift;
+		unsigned int prev, shift, mask;
 
 		shift = (2 ^ (address & 2)) << 3;
 		address ^= address & 2;
+		old = (old & 0xffff) << shift;
+		new = (new & 0xffff) << shift;
+		mask = ~(0xffff << shift);
 		asm volatile(
 			"	l	%[prev],%[address]\n"
-			"0:	nr	%[prev],%[mask]\n"
-			"	lr	%[tmp],%[prev]\n"
-			"	or	%[prev],%[old]\n"
-			"	or	%[tmp],%[new]\n"
-			"	cs	%[prev],%[tmp],%[address]\n"
+			"	nr	%[prev],%[mask]\n"
+			"	xilf	%[mask],0xffffffff\n"
+			"	or	%[new],%[prev]\n"
+			"	or	%[prev],%[tmp]\n"
+			"0:	lr	%[tmp],%[prev]\n"
+			"	cs	%[prev],%[new],%[address]\n"
 			"	jnl	1f\n"
 			"	xr	%[tmp],%[prev]\n"
+			"	xr	%[new],%[tmp]\n"
 			"	nr	%[tmp],%[mask]\n"
-			"	jnz	0b\n"
+			"	jz	0b\n"
 			"1:"
 			: [prev] "=&d" (prev),
-			  [tmp] "=&d" (tmp),
-			  [address] "+Q" (*(int *)address)
-			: [old] "d" ((old & 0xffff) << shift),
-			  [new] "d" ((new & 0xffff) << shift),
-			  [mask] "d" (~(0xffff << shift))
-			: "memory", "cc");
+			  [address] "+Q" (*(int *)address),
+			  [tmp] "+&d" (old),
+			  [new] "+&d" (new),
+			  [mask] "+&d" (mask)
+			:: "memory", "cc");
 		return prev >> shift;
 	}
 	case 4: {
diff --git a/arch/s390/include/asm/uaccess.h b/arch/s390/include/asm/uaccess.h
index a125e60a1521..d028ee59e941 100644
--- a/arch/s390/include/asm/uaccess.h
+++ b/arch/s390/include/asm/uaccess.h
@@ -400,74 +400,82 @@  static __always_inline int __cmpxchg_user_key(unsigned long address, void *uval,
 
 	switch (size) {
 	case 1: {
-		unsigned int prev, tmp, shift;
+		unsigned int prev, shift, mask, _old, _new;
 
 		shift = (3 ^ (address & 3)) << 3;
 		address ^= address & 3;
+		_old = (old & 0xff) << shift;
+		_new = (new & 0xff) << shift;
+		mask = ~(0xff << shift);
 		asm volatile(
 			"	spka	0(%[key])\n"
 			"	sacf	256\n"
 			"0:	l	%[prev],%[address]\n"
 			"1:	nr	%[prev],%[mask]\n"
-			"	lr	%[tmp],%[prev]\n"
-			"	or	%[prev],%[old]\n"
-			"	or	%[tmp],%[new]\n"
-			"2:	cs	%[prev],%[tmp],%[address]\n"
-			"3:	jnl	4f\n"
+			"	xilf	%[mask],0xffffffff\n"
+			"	or	%[new],%[prev]\n"
+			"	or	%[prev],%[tmp]\n"
+			"2:	lr	%[tmp],%[prev]\n"
+			"3:	cs	%[prev],%[new],%[address]\n"
+			"4:	jnl	5f\n"
 			"	xr	%[tmp],%[prev]\n"
+			"	xr	%[new],%[tmp]\n"
 			"	nr	%[tmp],%[mask]\n"
-			"	jnz	1b\n"
-			"4:	sacf	768\n"
+			"	jz	2b\n"
+			"5:	sacf	768\n"
 			"	spka	%[default_key]\n"
-			EX_TABLE_UA_LOAD_REG(0b, 4b, %[rc], %[prev])
-			EX_TABLE_UA_LOAD_REG(1b, 4b, %[rc], %[prev])
-			EX_TABLE_UA_LOAD_REG(2b, 4b, %[rc], %[prev])
-			EX_TABLE_UA_LOAD_REG(3b, 4b, %[rc], %[prev])
+			EX_TABLE_UA_LOAD_REG(0b, 5b, %[rc], %[prev])
+			EX_TABLE_UA_LOAD_REG(1b, 5b, %[rc], %[prev])
+			EX_TABLE_UA_LOAD_REG(3b, 5b, %[rc], %[prev])
+			EX_TABLE_UA_LOAD_REG(4b, 5b, %[rc], %[prev])
 			: [rc] "+&d" (rc),
 			  [prev] "=&d" (prev),
-			  [tmp] "=&d" (tmp),
-			  [address] "+Q" (*(int *)address)
-			: [old] "d" (((unsigned int)old & 0xff) << shift),
-			  [new] "d" (((unsigned int)new & 0xff) << shift),
-			  [mask] "d" (~(0xff << shift)),
-			  [key] "a" (key << 4),
+			  [address] "+Q" (*(int *)address),
+			  [tmp] "+&d" (_old),
+			  [new] "+&d" (_new),
+			  [mask] "+&d" (mask)
+			: [key] "a" (key << 4),
 			  [default_key] "J" (PAGE_DEFAULT_KEY)
 			: "memory", "cc");
 		*(unsigned char *)uval = prev >> shift;
 		return rc;
 	}
 	case 2: {
-		unsigned int prev, tmp, shift;
+		unsigned int prev, shift, mask, _old, _new;
 
 		shift = (2 ^ (address & 2)) << 3;
 		address ^= address & 2;
+		_old = (old & 0xffff) << shift;
+		_new = (new & 0xffff) << shift;
+		mask = ~(0xffff << shift);
 		asm volatile(
 			"	spka	0(%[key])\n"
 			"	sacf	256\n"
 			"0:	l	%[prev],%[address]\n"
 			"1:	nr	%[prev],%[mask]\n"
-			"	lr	%[tmp],%[prev]\n"
-			"	or	%[prev],%[old]\n"
-			"	or	%[tmp],%[new]\n"
-			"2:	cs	%[prev],%[tmp],%[address]\n"
-			"3:	jnl	4f\n"
+			"	xilf	%[mask],0xffffffff\n"
+			"	or	%[new],%[prev]\n"
+			"	or	%[prev],%[tmp]\n"
+			"2:	lr	%[tmp],%[prev]\n"
+			"3:	cs	%[prev],%[new],%[address]\n"
+			"4:	jnl	5f\n"
 			"	xr	%[tmp],%[prev]\n"
+			"	xr	%[new],%[tmp]\n"
 			"	nr	%[tmp],%[mask]\n"
-			"	jnz	1b\n"
-			"4:	sacf	768\n"
+			"	jz	2b\n"
+			"5:	sacf	768\n"
 			"	spka	%[default_key]\n"
-			EX_TABLE_UA_LOAD_REG(0b, 4b, %[rc], %[prev])
-			EX_TABLE_UA_LOAD_REG(1b, 4b, %[rc], %[prev])
-			EX_TABLE_UA_LOAD_REG(2b, 4b, %[rc], %[prev])
-			EX_TABLE_UA_LOAD_REG(3b, 4b, %[rc], %[prev])
+			EX_TABLE_UA_LOAD_REG(0b, 5b, %[rc], %[prev])
+			EX_TABLE_UA_LOAD_REG(1b, 5b, %[rc], %[prev])
+			EX_TABLE_UA_LOAD_REG(3b, 5b, %[rc], %[prev])
+			EX_TABLE_UA_LOAD_REG(4b, 5b, %[rc], %[prev])
 			: [rc] "+&d" (rc),
 			  [prev] "=&d" (prev),
-			  [tmp] "=&d" (tmp),
-			  [address] "+Q" (*(int *)address)
-			: [old] "d" (((unsigned int)old & 0xffff) << shift),
-			  [new] "d" (((unsigned int)new & 0xffff) << shift),
-			  [mask] "d" (~(0xffff << shift)),
-			  [key] "a" (key << 4),
+			  [address] "+Q" (*(int *)address),
+			  [tmp] "+&d" (_old),
+			  [new] "+&d" (_new),
+			  [mask] "+&d" (mask)
+			: [key] "a" (key << 4),
 			  [default_key] "J" (PAGE_DEFAULT_KEY)
 			: "memory", "cc");
 		*(unsigned short *)uval = prev >> shift;