aarch64: Fix movv8di for overlapping register and memory load [PR113550]

Message ID 20240124070103.3800874-1-quic_apinski@quicinc.com
State Accepted
Headers
Series aarch64: Fix movv8di for overlapping register and memory load [PR113550] |

Checks

Context Check Description
snail/gcc-patch-check success Github commit url

Commit Message

Andrew Pinski (QUIC) Jan. 24, 2024, 7:01 a.m. UTC
  The split for movv8di is not ready to handle the case where the setting
register overlaps with the address of the memory that is being load.
Fixing the split than just making the output constraint as an early clobber
for this alternative. The split would first need to figure out which register
is overlapping with the address and then only emit that move last.

Build and tested for aarch64-linux-gnu with no regressions

gcc/ChangeLog:

	* config/aarch64/aarch64-simd.md (*aarch64_movv8di): Mark the last
	alternative's output constraint as an early clobber.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
---
 gcc/config/aarch64/aarch64-simd.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

Richard Sandiford Jan. 25, 2024, 12:08 p.m. UTC | #1
Andrew Pinski <quic_apinski@quicinc.com> writes:
> The split for movv8di is not ready to handle the case where the setting
> register overlaps with the address of the memory that is being load.
> Fixing the split than just making the output constraint as an early clobber
> for this alternative. The split would first need to figure out which register
> is overlapping with the address and then only emit that move last.

I was curious how strained that detection would be in practice, and in
the end it didn't seem too bad.  I pushed the following variant after
testing on aarch64-linux-gnu.

Thanks,
Richard


The LS64 movv8di pattern didn't handle loads that overlapped with
the address register (unless the overlap happened to be in the
last subload).

gcc/
	PR target/113550
	* config/aarch64/aarch64-simd.md: In the movv8di splitter, check
	whether each split instruction is a load that clobbers the source
	address.  Emit that instruction last if so.

gcc/testsuite/
	PR target/113550
	* gcc.target/aarch64/pr113550.c: New test.
---
 gcc/config/aarch64/aarch64-simd.md          | 18 ++++++--
 gcc/testsuite/gcc.target/aarch64/pr113550.c | 48 +++++++++++++++++++++
 2 files changed, 62 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr113550.c

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 48f0741e7d0..f036f6ce997 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -8221,11 +8221,21 @@ (define_split
 	   || (memory_operand (operands[0], V8DImode)
 	       && register_operand (operands[1], V8DImode)))
     {
+      std::pair<rtx, rtx> last_pair = {};
       for (int offset = 0; offset < 64; offset += 16)
-	emit_move_insn (simplify_gen_subreg (TImode, operands[0],
-					     V8DImode, offset),
-			simplify_gen_subreg (TImode, operands[1],
-					     V8DImode, offset));
+        {
+	  std::pair<rtx, rtx> pair = {
+	    simplify_gen_subreg (TImode, operands[0], V8DImode, offset),
+	    simplify_gen_subreg (TImode, operands[1], V8DImode, offset)
+	  };
+	  if (register_operand (pair.first, TImode)
+	      && reg_overlap_mentioned_p (pair.first, pair.second))
+	    last_pair = pair;
+	  else
+	    emit_move_insn (pair.first, pair.second);
+        }
+      if (last_pair.first)
+	emit_move_insn (last_pair.first, last_pair.second);
       DONE;
     }
   else
diff --git a/gcc/testsuite/gcc.target/aarch64/pr113550.c b/gcc/testsuite/gcc.target/aarch64/pr113550.c
new file mode 100644
index 00000000000..0ff3c7b5c00
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr113550.c
@@ -0,0 +1,48 @@
+/* { dg-options "-O" } */
+/* { dg-do run } */
+
+#pragma GCC push_options
+#pragma GCC target "+ls64"
+#pragma GCC aarch64 "arm_acle.h"
+#pragma GCC pop_options
+
+#define DEF_FUNCTION(NAME, ARGS)		\
+  __attribute__((noipa))			\
+  __arm_data512_t				\
+  NAME ARGS					\
+  {						\
+    return *ptr;				\
+  }
+
+DEF_FUNCTION (f0, (__arm_data512_t *ptr))
+DEF_FUNCTION (f1, (int x0, __arm_data512_t *ptr))
+DEF_FUNCTION (f2, (int x0, int x1, __arm_data512_t *ptr))
+DEF_FUNCTION (f3, (int x0, int x1, int x2, __arm_data512_t *ptr))
+DEF_FUNCTION (f4, (int x0, int x1, int x2, int x3, __arm_data512_t *ptr))
+DEF_FUNCTION (f5, (int x0, int x1, int x2, int x3, int x4,
+		   __arm_data512_t *ptr))
+DEF_FUNCTION (f6, (int x0, int x1, int x2, int x3, int x4, int x5,
+		   __arm_data512_t *ptr))
+DEF_FUNCTION (f7, (int x0, int x1, int x2, int x3, int x4, int x5, int x6,
+		   __arm_data512_t *ptr))
+
+int
+main (void)
+{
+  __arm_data512_t x = { 0, 10, 20, 30, 40, 50, 60, 70 };
+  __arm_data512_t res[8] =
+  {
+    f0 (&x),
+    f1 (0, &x),
+    f2 (0, 1, &x),
+    f3 (0, 1, 2, &x),
+    f4 (0, 1, 2, 3, &x),
+    f5 (0, 1, 2, 3, 4, &x),
+    f6 (0, 1, 2, 3, 4, 5, &x),
+    f7 (0, 1, 2, 3, 4, 5, 6, &x)
+  };
+  for (int i = 0; i < 8; ++i)
+    if (__builtin_memcmp (&x, &res[i], sizeof (x)) != 0)
+      __builtin_abort ();
+  return 0;
+}
  

Patch

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 662ef696630..ba079298b84 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -7985,7 +7985,7 @@  (define_insn "*aarch64_mov<mode>"
 )
 
 (define_insn "*aarch64_movv8di"
-  [(set (match_operand:V8DI 0 "nonimmediate_operand" "=r,m,r")
+  [(set (match_operand:V8DI 0 "nonimmediate_operand" "=r,m,&r")
 	(match_operand:V8DI 1 "general_operand" " r,r,m"))]
   "(register_operand (operands[0], V8DImode)
     || register_operand (operands[1], V8DImode))"