[03/16] aarch64: Distinguish streaming-compatible AdvSIMD insns

Message ID mptbkpbp41y.fsf@arm.com
State Unresolved
Headers
Series aarch64: Add support for SME |

Checks

Context Check Description
snail/gcc-patch-check warning Git am fail log

Commit Message

Richard Sandiford Nov. 13, 2022, 10 a.m. UTC
  The vast majority of Advanced SIMD instructions are not
available in streaming mode, but some of the load/store/move
instructions are.  This patch adds a new target feature macro
called TARGET_BASE_SIMD for this streaming-compatible subset.

The vector-to-vector move instructions are not streaming-compatible,
so we need to use the SVE move instructions where enabled, or fall
back to the nofp16 handling otherwise.

I haven't found a good way of testing the SVE EXT alternative
in aarch64_simd_mov_from_<mode>high, but I'd rather provide it
than not.

gcc/
	* config/aarch64/aarch64.h (TARGET_BASE_SIMD): New macro.
	(TARGET_SIMD): Require PSTATE.SM to be 0.
	(AARCH64_ISA_SM_OFF): New macro.
	* config/aarch64/aarch64.cc (aarch64_array_mode_supported_p):
	Allow Advanced SIMD structure modes for TARGET_BASE_SIMD.
	(aarch64_print_operand): Support '%Z'.
	(aarch64_secondary_reload): Expect SVE moves to be used for
	Advanced SIMD modes if SVE is enabled and non-streaming
	Advanced SIMD isn't.
	(aarch64_register_move_cost): Likewise.
	(aarch64_simd_container_mode): Extend Advanced SIMD mode
	handling to TARGET_BASE_SIMD.
	(aarch64_expand_cpymem): Expand commentary.
	* config/aarch64/aarch64.md (arches): Add base_simd.
	(arch_enabled): Handle it.
	(*mov<mode>_aarch64): Extend UMOV alternative to TARGET_BASE_SIMD.
	(*movti_aarch64): Use an SVE move instruction if non-streaming
	SIMD isn't available.
	(*mov<TFD:mode>_aarch64): Likewise.
	(load_pair_dw_tftf): Extend to TARGET_BASE_SIMD.
	(store_pair_dw_tftf): Likewise.
	(loadwb_pair<TX:mode>_<P:mode>): Likewise.
	(storewb_pair<TX:mode>_<P:mode>): Likewise.
	* config/aarch64/aarch64-simd.md (*aarch64_simd_mov<VDMOV:mode>):
	Allow UMOV in streaming mode.
	(*aarch64_simd_mov<VQMOV:mode>): Use an SVE move instruction
	if non-streaming SIMD isn't available.
	(aarch64_store_lane0<mode>): Depend on TARGET_FLOAT rather than
	TARGET_SIMD.
	(aarch64_simd_mov_from_<mode>low): Likewise.  Use fmov if
	Advanced SIMD is completely disabled.
	(aarch64_simd_mov_from_<mode>high): Use SVE EXT instructions if
	non-streaming SIMD isn't available.

gcc/testsuite/
	* gcc.target/aarch64/movdf_2.c: New test.
	* gcc.target/aarch64/movdi_3.c: Likewise.
	* gcc.target/aarch64/movhf_2.c: Likewise.
	* gcc.target/aarch64/movhi_2.c: Likewise.
	* gcc.target/aarch64/movqi_2.c: Likewise.
	* gcc.target/aarch64/movsf_2.c: Likewise.
	* gcc.target/aarch64/movsi_2.c: Likewise.
	* gcc.target/aarch64/movtf_3.c: Likewise.
	* gcc.target/aarch64/movtf_4.c: Likewise.
	* gcc.target/aarch64/movti_3.c: Likewise.
	* gcc.target/aarch64/movti_4.c: Likewise.
	* gcc.target/aarch64/movv16qi_4.c: Likewise.
	* gcc.target/aarch64/movv16qi_5.c: Likewise.
	* gcc.target/aarch64/movv8qi_4.c: Likewise.
	* gcc.target/aarch64/sme/arm_neon_1.c: Likewise.
	* gcc.target/aarch64/sme/arm_neon_2.c: Likewise.
	* gcc.target/aarch64/sme/arm_neon_3.c: Likewise.
---
 gcc/config/aarch64/aarch64-simd.md            | 43 ++++++----
 gcc/config/aarch64/aarch64.cc                 | 22 +++--
 gcc/config/aarch64/aarch64.h                  | 12 ++-
 gcc/config/aarch64/aarch64.md                 | 45 +++++-----
 gcc/testsuite/gcc.target/aarch64/movdf_2.c    | 51 +++++++++++
 gcc/testsuite/gcc.target/aarch64/movdi_3.c    | 59 +++++++++++++
 gcc/testsuite/gcc.target/aarch64/movhf_2.c    | 53 ++++++++++++
 gcc/testsuite/gcc.target/aarch64/movhi_2.c    | 61 +++++++++++++
 gcc/testsuite/gcc.target/aarch64/movqi_2.c    | 59 +++++++++++++
 gcc/testsuite/gcc.target/aarch64/movsf_2.c    | 51 +++++++++++
 gcc/testsuite/gcc.target/aarch64/movsi_2.c    | 59 +++++++++++++
 gcc/testsuite/gcc.target/aarch64/movtf_3.c    | 81 +++++++++++++++++
 gcc/testsuite/gcc.target/aarch64/movtf_4.c    | 78 +++++++++++++++++
 gcc/testsuite/gcc.target/aarch64/movti_3.c    | 86 +++++++++++++++++++
 gcc/testsuite/gcc.target/aarch64/movti_4.c    | 83 ++++++++++++++++++
 gcc/testsuite/gcc.target/aarch64/movv16qi_4.c | 82 ++++++++++++++++++
 gcc/testsuite/gcc.target/aarch64/movv16qi_5.c | 79 +++++++++++++++++
 gcc/testsuite/gcc.target/aarch64/movv8qi_4.c  | 55 ++++++++++++
 .../gcc.target/aarch64/sme/arm_neon_1.c       | 13 +++
 .../gcc.target/aarch64/sme/arm_neon_2.c       | 11 +++
 .../gcc.target/aarch64/sme/arm_neon_3.c       | 11 +++
 21 files changed, 1047 insertions(+), 47 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/movdf_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/movdi_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/movhf_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/movhi_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/movqi_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/movsf_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/movsi_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/movtf_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/movtf_4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/movti_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/movti_4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/movv16qi_4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/movv16qi_5.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/movv8qi_4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/arm_neon_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/arm_neon_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/arm_neon_3.c
  

Patch

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 5386043739a..b6313cba172 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -133,7 +133,7 @@  (define_insn "*aarch64_simd_mov<VDMOV:mode>"
 	 return "mov\t%0.<Vbtype>, %1.<Vbtype>";
        return "fmov\t%d0, %d1";
      case 4:
-       if (TARGET_SIMD)
+       if (TARGET_BASE_SIMD)
 	 return "umov\t%0, %1.d[0]";
        return "fmov\t%x0, %d1";
      case 5: return "fmov\t%d0, %1";
@@ -152,9 +152,9 @@  (define_insn "*aarch64_simd_mov<VDMOV:mode>"
 
 (define_insn "*aarch64_simd_mov<VQMOV:mode>"
   [(set (match_operand:VQMOV 0 "nonimmediate_operand"
-		"=w, Umn,  m,  w, ?r, ?w, ?r, w,  w")
+		"=w, Umn, m,  w,  w, ?r, ?w, ?r,  w,  w")
 	(match_operand:VQMOV 1 "general_operand"
-		"m,  Dz, w,  w,  w,  r,  r, Dn, Dz"))]
+		 "m,  Dz, w,  w,  w,  w,  r,  r, Dn, Dz"))]
   "TARGET_FLOAT
    && (register_operand (operands[0], <MODE>mode)
        || aarch64_simd_reg_or_zero (operands[1], <MODE>mode))"
@@ -170,22 +170,24 @@  (define_insn "*aarch64_simd_mov<VQMOV:mode>"
     case 3:
 	return "mov\t%0.<Vbtype>, %1.<Vbtype>";
     case 4:
+	return "mov\t%Z0.d, %Z1.d";
     case 5:
     case 6:
-	return "#";
     case 7:
-	return aarch64_output_simd_mov_immediate (operands[1], 128);
+	return "#";
     case 8:
+	return aarch64_output_simd_mov_immediate (operands[1], 128);
+    case 9:
 	return "fmov\t%d0, xzr";
     default:
 	gcc_unreachable ();
     }
 }
   [(set_attr "type" "neon_load1_1reg<q>, store_16, neon_store1_1reg<q>,\
-		     neon_logic<q>, multiple, multiple,\
-		     multiple, neon_move<q>, fmov")
-   (set_attr "length" "4,4,4,4,8,8,8,4,4")
-   (set_attr "arch" "*,*,*,simd,*,*,*,simd,*")]
+		     neon_logic<q>, *, multiple, multiple,\
+		     multiple, neon_move<q>, f_mcr")
+   (set_attr "length" "4,4,4,4,4,8,8,8,4,4")
+   (set_attr "arch" "*,*,*,simd,sve,*,*,*,simd,*")]
 )
 
 ;; When storing lane zero we can use the normal STR and its more permissive
@@ -195,7 +197,7 @@  (define_insn "aarch64_store_lane0<mode>"
   [(set (match_operand:<VEL> 0 "memory_operand" "=m")
 	(vec_select:<VEL> (match_operand:VALL_F16 1 "register_operand" "w")
 			(parallel [(match_operand 2 "const_int_operand" "n")])))]
-  "TARGET_SIMD
+  "TARGET_FLOAT
    && ENDIAN_LANE_N (<nunits>, INTVAL (operands[2])) == 0"
   "str\\t%<Vetype>1, %0"
   [(set_attr "type" "neon_store1_1reg<q>")]
@@ -353,35 +355,38 @@  (define_expand "aarch64_get_high<mode>"
 )
 
 (define_insn_and_split "aarch64_simd_mov_from_<mode>low"
-  [(set (match_operand:<VHALF> 0 "register_operand" "=w,?r")
+  [(set (match_operand:<VHALF> 0 "register_operand" "=w,?r,?r")
         (vec_select:<VHALF>
-          (match_operand:VQMOV_NO2E 1 "register_operand" "w,w")
+          (match_operand:VQMOV_NO2E 1 "register_operand" "w,w,w")
           (match_operand:VQMOV_NO2E 2 "vect_par_cnst_lo_half" "")))]
-  "TARGET_SIMD"
+  "TARGET_FLOAT"
   "@
    #
-   umov\t%0, %1.d[0]"
+   umov\t%0, %1.d[0]
+   fmov\t%0, %d1"
   "&& reload_completed && aarch64_simd_register (operands[0], <VHALF>mode)"
   [(set (match_dup 0) (match_dup 1))]
   {
     operands[1] = aarch64_replace_reg_mode (operands[1], <VHALF>mode);
   }
-  [(set_attr "type" "mov_reg,neon_to_gp<q>")
+  [(set_attr "type" "mov_reg,neon_to_gp<q>,f_mrc")
+   (set_attr "arch" "simd,base_simd,*")
    (set_attr "length" "4")]
 )
 
 (define_insn "aarch64_simd_mov_from_<mode>high"
-  [(set (match_operand:<VHALF> 0 "register_operand" "=w,?r,?r")
+  [(set (match_operand:<VHALF> 0 "register_operand" "=w,w,?r,?r")
         (vec_select:<VHALF>
-          (match_operand:VQMOV_NO2E 1 "register_operand" "w,w,w")
+          (match_operand:VQMOV_NO2E 1 "register_operand" "w,0,w,w")
           (match_operand:VQMOV_NO2E 2 "vect_par_cnst_hi_half" "")))]
   "TARGET_FLOAT"
   "@
    dup\t%d0, %1.d[1]
+   ext\t%Z0.b, %Z0.b, %Z0.b, #8
    umov\t%0, %1.d[1]
    fmov\t%0, %1.d[1]"
-  [(set_attr "type" "neon_dup<q>,neon_to_gp<q>,f_mrc")
-   (set_attr "arch" "simd,simd,*")
+  [(set_attr "type" "neon_dup<q>,*,neon_to_gp<q>,f_mrc")
+   (set_attr "arch" "simd,sve,simd,*")
    (set_attr "length" "4")]
 )
 
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index fc6f0bc208a..36ef0435b4e 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -3726,7 +3726,7 @@  static bool
 aarch64_array_mode_supported_p (machine_mode mode,
 				unsigned HOST_WIDE_INT nelems)
 {
-  if (TARGET_SIMD
+  if (TARGET_BASE_SIMD
       && (AARCH64_VALID_SIMD_QREG_MODE (mode)
 	  || AARCH64_VALID_SIMD_DREG_MODE (mode))
       && (nelems >= 2 && nelems <= 4))
@@ -11876,6 +11876,10 @@  sizetochar (int size)
      'N':		Take the duplicated element in a vector constant
 			and print the negative of it in decimal.
      'b/h/s/d/q':	Print a scalar FP/SIMD register name.
+     'Z':		Same for SVE registers.  ('z' was already taken.)
+			Note that it is not necessary to use %Z for operands
+			that have SVE modes.  The convention is to use %Z
+			only for non-SVE (or potentially non-SVE) modes.
      'S/T/U/V':		Print a FP/SIMD register name for a register list.
 			The register printed is the FP/SIMD register name
 			of X + 0/1/2/3 for S/T/U/V.
@@ -12048,6 +12052,8 @@  aarch64_print_operand (FILE *f, rtx x, int code)
     case 's':
     case 'd':
     case 'q':
+    case 'Z':
+      code = TOLOWER (code);
       if (!REG_P (x) || !FP_REGNUM_P (REGNO (x)))
 	{
 	  output_operand_lossage ("incompatible floating point / vector register operand for '%%%c'", code);
@@ -12702,8 +12708,8 @@  aarch64_secondary_reload (bool in_p ATTRIBUTE_UNUSED, rtx x,
       return NO_REGS;
     }
 
-  /* Without the TARGET_SIMD instructions we cannot move a Q register
-     to a Q register directly.  We need a scratch.  */
+  /* Without the TARGET_SIMD or TARGET_SVE instructions we cannot move a
+     Q register to a Q register directly.  We need a scratch.  */
   if (REG_P (x)
       && (mode == TFmode
 	  || mode == TImode
@@ -15273,7 +15279,7 @@  aarch64_register_move_cost (machine_mode mode,
 	 secondary reload.  A general register is used as a scratch to move
 	 the upper DI value and the lower DI value is moved directly,
 	 hence the cost is the sum of three moves. */
-      if (! TARGET_SIMD)
+      if (!TARGET_SIMD && !TARGET_SVE)
 	return regmove_cost->GP2FP + regmove_cost->FP2GP + regmove_cost->FP2FP;
 
       return regmove_cost->FP2FP;
@@ -20773,7 +20779,7 @@  aarch64_simd_container_mode (scalar_mode mode, poly_int64 width)
     return aarch64_full_sve_mode (mode).else_mode (word_mode);
 
   gcc_assert (known_eq (width, 64) || known_eq (width, 128));
-  if (TARGET_SIMD)
+  if (TARGET_BASE_SIMD)
     {
       if (known_eq (width, 128))
 	return aarch64_vq_mode (mode).else_mode (word_mode);
@@ -24908,7 +24914,11 @@  aarch64_expand_cpymem (rtx *operands)
   int copy_bits = 256;
 
   /* Default to 256-bit LDP/STP on large copies, however small copies, no SIMD
-     support or slow 256-bit LDP/STP fall back to 128-bit chunks.  */
+     support or slow 256-bit LDP/STP fall back to 128-bit chunks.
+
+     ??? Although it would be possible to use LDP/STP Qn in streaming mode
+     (so using TARGET_BASE_SIMD instead of TARGET_SIMD), it isn't clear
+     whether that would improve performance.  */
   if (size <= 24
       || !TARGET_SIMD
       || (aarch64_tune_params.extra_tuning_flags
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index c47f27eefec..398cc03fd1f 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -61,8 +61,15 @@ 
 #define WORDS_BIG_ENDIAN (BYTES_BIG_ENDIAN)
 
 /* AdvSIMD is supported in the default configuration, unless disabled by
-   -mgeneral-regs-only or by the +nosimd extension.  */
-#define TARGET_SIMD (AARCH64_ISA_SIMD)
+   -mgeneral-regs-only or by the +nosimd extension.  The set of available
+   instructions is then subdivided into:
+
+   - the "base" set, available both in SME streaming mode and in
+     non-streaming mode
+
+   - the full set, available only in non-streaming mode.  */
+#define TARGET_BASE_SIMD (AARCH64_ISA_SIMD)
+#define TARGET_SIMD (AARCH64_ISA_SIMD && AARCH64_ISA_SM_OFF)
 #define TARGET_FLOAT (AARCH64_ISA_FP)
 
 #define UNITS_PER_WORD		8
@@ -199,6 +206,7 @@  constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF;
 
 /* Macros to test ISA flags.  */
 
+#define AARCH64_ISA_SM_OFF         (aarch64_isa_flags & AARCH64_FL_SM_OFF)
 #define AARCH64_ISA_MODE           (aarch64_isa_flags & AARCH64_FL_ISA_MODES)
 #define AARCH64_ISA_CRC            (aarch64_isa_flags & AARCH64_FL_CRC)
 #define AARCH64_ISA_CRYPTO         (aarch64_isa_flags & AARCH64_FL_CRYPTO)
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index cd6d5e5000c..3dc877ba9fe 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -374,7 +374,7 @@  (define_constants
 ;; As a convenience, "fp_q" means "fp" + the ability to move between
 ;; Q registers and is equivalent to "simd".
 
-(define_enum "arches" [ any rcpc8_4 fp fp_q simd sve fp16])
+(define_enum "arches" [any rcpc8_4 fp fp_q base_simd simd sve fp16])
 
 (define_enum_attr "arch" "arches" (const_string "any"))
 
@@ -402,6 +402,9 @@  (define_attr "arch_enabled" "no,yes"
 	(and (eq_attr "arch" "fp")
 	     (match_test "TARGET_FLOAT"))
 
+	(and (eq_attr "arch" "base_simd")
+	     (match_test "TARGET_BASE_SIMD"))
+
 	(and (eq_attr "arch" "fp_q, simd")
 	     (match_test "TARGET_SIMD"))
 
@@ -1215,7 +1218,7 @@  (define_insn "*mov<mode>_aarch64"
      case 8:
        return "str\t%<size>1, %0";
      case 9:
-       return TARGET_SIMD ? "umov\t%w0, %1.<v>[0]" : "fmov\t%w0, %s1";
+       return TARGET_BASE_SIMD ? "umov\t%w0, %1.<v>[0]" : "fmov\t%w0, %s1";
      case 10:
        return TARGET_SIMD ? "dup\t%0.<Vallxd>, %w1" : "fmov\t%s0, %w1";
      case 11:
@@ -1395,9 +1398,9 @@  (define_expand "movti"
 
 (define_insn "*movti_aarch64"
   [(set (match_operand:TI 0
-	 "nonimmediate_operand"  "=   r,w,w,w, r,w,r,m,m,w,m")
+	 "nonimmediate_operand"  "=   r,w,w,w, r,w,w,r,m,m,w,m")
 	(match_operand:TI 1
-	 "aarch64_movti_operand" " rUti,Z,Z,r, w,w,m,r,Z,m,w"))]
+	 "aarch64_movti_operand" " rUti,Z,Z,r, w,w,w,m,r,Z,m,w"))]
   "(register_operand (operands[0], TImode)
     || aarch64_reg_or_zero (operands[1], TImode))"
   "@
@@ -1407,16 +1410,17 @@  (define_insn "*movti_aarch64"
    #
    #
    mov\\t%0.16b, %1.16b
+   mov\\t%Z0.d, %Z1.d
    ldp\\t%0, %H0, %1
    stp\\t%1, %H1, %0
    stp\\txzr, xzr, %0
    ldr\\t%q0, %1
    str\\t%q1, %0"
-  [(set_attr "type" "multiple,neon_move,f_mcr,f_mcr,f_mrc,neon_logic_q, \
+  [(set_attr "type" "multiple,neon_move,f_mcr,f_mcr,f_mrc,neon_logic_q,*,\
 		             load_16,store_16,store_16,\
                              load_16,store_16")
-   (set_attr "length" "8,4,4,8,8,4,4,4,4,4,4")
-   (set_attr "arch" "*,simd,*,*,*,simd,*,*,*,fp,fp")]
+   (set_attr "length" "8,4,4,8,8,4,4,4,4,4,4,4")
+   (set_attr "arch" "*,simd,*,*,*,simd,sve,*,*,*,fp,fp")]
 )
 
 ;; Split a TImode register-register or register-immediate move into
@@ -1552,13 +1556,14 @@  (define_split
 
 (define_insn "*mov<mode>_aarch64"
   [(set (match_operand:TFD 0
-	 "nonimmediate_operand" "=w,?r ,w ,?r,w,?w,w,m,?r,m ,m")
+	 "nonimmediate_operand" "=w,w,?r ,w ,?r,w,?w,w,m,?r,m ,m")
 	(match_operand:TFD 1
-	 "general_operand"      " w,?rY,?r,w ,Y,Y ,m,w,m ,?r,Y"))]
+	 "general_operand"      " w,w,?rY,?r,w ,Y,Y ,m,w,m ,?r,Y"))]
   "TARGET_FLOAT && (register_operand (operands[0], <MODE>mode)
     || aarch64_reg_or_fp_zero (operands[1], <MODE>mode))"
   "@
    mov\\t%0.16b, %1.16b
+   mov\\t%Z0.d, %Z1.d
    #
    #
    #
@@ -1569,10 +1574,10 @@  (define_insn "*mov<mode>_aarch64"
    ldp\\t%0, %H0, %1
    stp\\t%1, %H1, %0
    stp\\txzr, xzr, %0"
-  [(set_attr "type" "logic_reg,multiple,f_mcr,f_mrc,neon_move_q,f_mcr,\
+  [(set_attr "type" "logic_reg,*,multiple,f_mcr,f_mrc,neon_move_q,f_mcr,\
                      f_loadd,f_stored,load_16,store_16,store_16")
-   (set_attr "length" "4,8,8,8,4,4,4,4,4,4,4")
-   (set_attr "arch" "simd,*,*,*,simd,*,*,*,*,*,*")]
+   (set_attr "length" "4,4,8,8,8,4,4,4,4,4,4,4")
+   (set_attr "arch" "simd,sve,*,*,*,simd,*,*,*,*,*,*")]
 )
 
 (define_split
@@ -1756,7 +1761,7 @@  (define_insn "load_pair_dw_tftf"
 	(match_operand:TF 1 "aarch64_mem_pair_operand" "Ump"))
    (set (match_operand:TF 2 "register_operand" "=w")
 	(match_operand:TF 3 "memory_operand" "m"))]
-   "TARGET_SIMD
+   "TARGET_BASE_SIMD
     && rtx_equal_p (XEXP (operands[3], 0),
 		    plus_constant (Pmode,
 				   XEXP (operands[1], 0),
@@ -1806,11 +1811,11 @@  (define_insn "store_pair_dw_tftf"
 	(match_operand:TF 1 "register_operand" "w"))
    (set (match_operand:TF 2 "memory_operand" "=m")
 	(match_operand:TF 3 "register_operand" "w"))]
-   "TARGET_SIMD &&
-    rtx_equal_p (XEXP (operands[2], 0),
-		 plus_constant (Pmode,
-				XEXP (operands[0], 0),
-				GET_MODE_SIZE (TFmode)))"
+   "TARGET_BASE_SIMD
+    && rtx_equal_p (XEXP (operands[2], 0),
+		    plus_constant (Pmode,
+				   XEXP (operands[0], 0),
+				   GET_MODE_SIZE (TFmode)))"
   "stp\\t%q1, %q3, %z0"
   [(set_attr "type" "neon_stp_q")
    (set_attr "fp" "yes")]
@@ -1858,7 +1863,7 @@  (define_insn "loadwb_pair<TX:mode>_<P:mode>"
      (set (match_operand:TX 3 "register_operand" "=w")
           (mem:TX (plus:P (match_dup 1)
 			  (match_operand:P 5 "const_int_operand" "n"))))])]
-  "TARGET_SIMD && INTVAL (operands[5]) == GET_MODE_SIZE (<TX:MODE>mode)"
+  "TARGET_BASE_SIMD && INTVAL (operands[5]) == GET_MODE_SIZE (<TX:MODE>mode)"
   "ldp\\t%q2, %q3, [%1], %4"
   [(set_attr "type" "neon_ldp_q")]
 )
@@ -1908,7 +1913,7 @@  (define_insn "storewb_pair<TX:mode>_<P:mode>"
      (set (mem:TX (plus:P (match_dup 0)
 			  (match_operand:P 5 "const_int_operand" "n")))
           (match_operand:TX 3 "register_operand" "w"))])]
-  "TARGET_SIMD
+  "TARGET_BASE_SIMD
    && INTVAL (operands[5])
       == INTVAL (operands[4]) + GET_MODE_SIZE (<TX:MODE>mode)"
   "stp\\t%q2, %q3, [%0, %4]!"
diff --git a/gcc/testsuite/gcc.target/aarch64/movdf_2.c b/gcc/testsuite/gcc.target/aarch64/movdf_2.c
new file mode 100644
index 00000000000..c2454d2c83e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/movdf_2.c
@@ -0,0 +1,51 @@ 
+/* { dg-do assemble } */
+/* { dg-options "-O --save-temps" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+/*
+** fpr_to_fpr:
+**	fmov	d0, d1
+**	ret
+*/
+double __attribute__((arm_streaming_compatible))
+fpr_to_fpr (double q0, double q1)
+{
+  return q1;
+}
+
+/*
+** gpr_to_fpr:
+**	fmov	d0, x0
+**	ret
+*/
+double __attribute__((arm_streaming_compatible))
+gpr_to_fpr ()
+{
+  register double x0 asm ("x0");
+  asm volatile ("" : "=r" (x0));
+  return x0;
+}
+
+/*
+** zero_to_fpr:
+**	fmov	d0, xzr
+**	ret
+*/
+double __attribute__((arm_streaming_compatible))
+zero_to_fpr ()
+{
+  return 0;
+}
+
+/*
+** fpr_to_gpr:
+**	fmov	x0, d0
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+fpr_to_gpr (double q0)
+{
+  register double x0 asm ("x0");
+  x0 = q0;
+  asm volatile ("" :: "r" (x0));
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/movdi_3.c b/gcc/testsuite/gcc.target/aarch64/movdi_3.c
new file mode 100644
index 00000000000..5d369b27356
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/movdi_3.c
@@ -0,0 +1,59 @@ 
+/* { dg-do assemble } */
+/* { dg-options "-O --save-temps" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#include <stdint.h>
+
+/*
+** fpr_to_fpr:
+**	fmov	d0, d1
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+fpr_to_fpr (void)
+{
+  register uint64_t q0 asm ("q0");
+  register uint64_t q1 asm ("q1");
+  asm volatile ("" : "=w" (q1));
+  q0 = q1;
+  asm volatile ("" :: "w" (q0));
+}
+
+/*
+** gpr_to_fpr:
+**	fmov	d0, x0
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+gpr_to_fpr (uint64_t x0)
+{
+  register uint64_t q0 asm ("q0");
+  q0 = x0;
+  asm volatile ("" :: "w" (q0));
+}
+
+/*
+** zero_to_fpr:
+**	fmov	d0, xzr
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+zero_to_fpr ()
+{
+  register uint64_t q0 asm ("q0");
+  q0 = 0;
+  asm volatile ("" :: "w" (q0));
+}
+
+/*
+** fpr_to_gpr:
+**	fmov	x0, d0
+**	ret
+*/
+uint64_t __attribute__((arm_streaming_compatible))
+fpr_to_gpr ()
+{
+  register uint64_t q0 asm ("q0");
+  asm volatile ("" : "=w" (q0));
+  return q0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/movhf_2.c b/gcc/testsuite/gcc.target/aarch64/movhf_2.c
new file mode 100644
index 00000000000..cf3af357b84
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/movhf_2.c
@@ -0,0 +1,53 @@ 
+/* { dg-do assemble } */
+/* { dg-options "-O --save-temps" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#pragma GCC target "+nothing+simd"
+
+/*
+** fpr_to_fpr:
+**	fmov	s0, s1
+**	ret
+*/
+_Float16 __attribute__((arm_streaming_compatible))
+fpr_to_fpr (_Float16 q0, _Float16 q1)
+{
+  return q1;
+}
+
+/*
+** gpr_to_fpr:
+**	fmov	s0, w0
+**	ret
+*/
+_Float16 __attribute__((arm_streaming_compatible))
+gpr_to_fpr ()
+{
+  register _Float16 w0 asm ("w0");
+  asm volatile ("" : "=r" (w0));
+  return w0;
+}
+
+/*
+** zero_to_fpr:
+**	fmov	s0, wzr
+**	ret
+*/
+_Float16 __attribute__((arm_streaming_compatible))
+zero_to_fpr ()
+{
+  return 0;
+}
+
+/*
+** fpr_to_gpr:
+**	fmov	w0, s0
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+fpr_to_gpr (_Float16 q0)
+{
+  register _Float16 w0 asm ("w0");
+  w0 = q0;
+  asm volatile ("" :: "r" (w0));
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/movhi_2.c b/gcc/testsuite/gcc.target/aarch64/movhi_2.c
new file mode 100644
index 00000000000..108923449b9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/movhi_2.c
@@ -0,0 +1,61 @@ 
+/* { dg-do assemble } */
+/* { dg-options "-O --save-temps" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#pragma GCC target "+nothing+simd"
+
+#include <stdint.h>
+
+/*
+** fpr_to_fpr:
+**	fmov	s0, s1
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+fpr_to_fpr (void)
+{
+  register uint16_t q0 asm ("q0");
+  register uint16_t q1 asm ("q1");
+  asm volatile ("" : "=w" (q1));
+  q0 = q1;
+  asm volatile ("" :: "w" (q0));
+}
+
+/*
+** gpr_to_fpr:
+**	fmov	s0, w0
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+gpr_to_fpr (uint16_t w0)
+{
+  register uint16_t q0 asm ("q0");
+  q0 = w0;
+  asm volatile ("" :: "w" (q0));
+}
+
+/*
+** zero_to_fpr:
+**	fmov	s0, wzr
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+zero_to_fpr ()
+{
+  register uint16_t q0 asm ("q0");
+  q0 = 0;
+  asm volatile ("" :: "w" (q0));
+}
+
+/*
+** fpr_to_gpr:
+**	umov	w0, v0.h\[0\]
+**	ret
+*/
+uint16_t __attribute__((arm_streaming_compatible))
+fpr_to_gpr ()
+{
+  register uint16_t q0 asm ("q0");
+  asm volatile ("" : "=w" (q0));
+  return q0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/movqi_2.c b/gcc/testsuite/gcc.target/aarch64/movqi_2.c
new file mode 100644
index 00000000000..a28547d2ba3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/movqi_2.c
@@ -0,0 +1,59 @@ 
+/* { dg-do assemble } */
+/* { dg-options "-O --save-temps" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#include <stdint.h>
+
+/*
+** fpr_to_fpr:
+**	fmov	s0, s1
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+fpr_to_fpr (void)
+{
+  register uint8_t q0 asm ("q0");
+  register uint8_t q1 asm ("q1");
+  asm volatile ("" : "=w" (q1));
+  q0 = q1;
+  asm volatile ("" :: "w" (q0));
+}
+
+/*
+** gpr_to_fpr:
+**	fmov	s0, w0
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+gpr_to_fpr (uint8_t w0)
+{
+  register uint8_t q0 asm ("q0");
+  q0 = w0;
+  asm volatile ("" :: "w" (q0));
+}
+
+/*
+** zero_to_fpr:
+**	fmov	s0, wzr
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+zero_to_fpr ()
+{
+  register uint8_t q0 asm ("q0");
+  q0 = 0;
+  asm volatile ("" :: "w" (q0));
+}
+
+/*
+** fpr_to_gpr:
+**	umov	w0, v0.b\[0\]
+**	ret
+*/
+uint8_t __attribute__((arm_streaming_compatible))
+fpr_to_gpr ()
+{
+  register uint8_t q0 asm ("q0");
+  asm volatile ("" : "=w" (q0));
+  return q0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/movsf_2.c b/gcc/testsuite/gcc.target/aarch64/movsf_2.c
new file mode 100644
index 00000000000..53abd380510
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/movsf_2.c
@@ -0,0 +1,51 @@ 
+/* { dg-do assemble } */
+/* { dg-options "-O --save-temps" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+/*
+** fpr_to_fpr:
+**	fmov	s0, s1
+**	ret
+*/
+float __attribute__((arm_streaming_compatible))
+fpr_to_fpr (float q0, float q1)
+{
+  return q1;
+}
+
+/*
+** gpr_to_fpr:
+**	fmov	s0, w0
+**	ret
+*/
+float __attribute__((arm_streaming_compatible))
+gpr_to_fpr ()
+{
+  register float w0 asm ("w0");
+  asm volatile ("" : "=r" (w0));
+  return w0;
+}
+
+/*
+** zero_to_fpr:
+**	fmov	s0, wzr
+**	ret
+*/
+float __attribute__((arm_streaming_compatible))
+zero_to_fpr ()
+{
+  return 0;
+}
+
+/*
+** fpr_to_gpr:
+**	fmov	w0, s0
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+fpr_to_gpr (float q0)
+{
+  register float w0 asm ("w0");
+  w0 = q0;
+  asm volatile ("" :: "r" (w0));
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/movsi_2.c b/gcc/testsuite/gcc.target/aarch64/movsi_2.c
new file mode 100644
index 00000000000..a0159d3fc1e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/movsi_2.c
@@ -0,0 +1,59 @@ 
+/* { dg-do assemble } */
+/* { dg-options "-O --save-temps" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#include <stdint.h>
+
+/*
+** fpr_to_fpr:
+**	fmov	s0, s1
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+fpr_to_fpr (void)
+{
+  register uint32_t q0 asm ("q0");
+  register uint32_t q1 asm ("q1");
+  asm volatile ("" : "=w" (q1));
+  q0 = q1;
+  asm volatile ("" :: "w" (q0));
+}
+
+/*
+** gpr_to_fpr:
+**	fmov	s0, w0
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+gpr_to_fpr (uint32_t w0)
+{
+  register uint32_t q0 asm ("q0");
+  q0 = w0;
+  asm volatile ("" :: "w" (q0));
+}
+
+/*
+** zero_to_fpr:
+**	fmov	s0, wzr
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+zero_to_fpr ()
+{
+  register uint32_t q0 asm ("q0");
+  q0 = 0;
+  asm volatile ("" :: "w" (q0));
+}
+
+/*
+** fpr_to_gpr:
+**	fmov	w0, s0
+**	ret
+*/
+uint32_t __attribute__((arm_streaming_compatible))
+fpr_to_gpr ()
+{
+  register uint32_t q0 asm ("q0");
+  asm volatile ("" : "=w" (q0));
+  return q0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/movtf_3.c b/gcc/testsuite/gcc.target/aarch64/movtf_3.c
new file mode 100644
index 00000000000..d38f59e2a1f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/movtf_3.c
@@ -0,0 +1,81 @@ 
+/* { dg-do assemble } */
+/* { dg-require-effective-target large_long_double } */
+/* { dg-options "-O -mtune=neoverse-v1 --save-temps" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#pragma GCC target "+nosve"
+
+/*
+** fpr_to_fpr:
+**	sub	sp, sp, #16
+**	str	q1, \[sp\]
+**	ldr	q0, \[sp\]
+**	add	sp, sp, #?16
+**	ret
+*/
+long double __attribute__((arm_streaming_compatible))
+fpr_to_fpr (long double q0, long double q1)
+{
+  return q1;
+}
+
+/*
+** gpr_to_fpr:	{ target aarch64_little_endian }
+**	fmov	d0, x0
+**	fmov	v0.d\[1\], x1
+**	ret
+*/
+/*
+** gpr_to_fpr:	{ target aarch64_big_endian }
+**	fmov	d0, x1
+**	fmov	v0.d\[1\], x0
+**	ret
+*/
+long double __attribute__((arm_streaming_compatible))
+gpr_to_fpr ()
+{
+  register long double x0 asm ("x0");
+  asm volatile ("" : "=r" (x0));
+  return x0;
+}
+
+/*
+** zero_to_fpr:
+**	fmov	s0, wzr
+**	ret
+*/
+long double __attribute__((arm_streaming_compatible))
+zero_to_fpr ()
+{
+  return 0;
+}
+
+/*
+** fpr_to_gpr:	{ target aarch64_little_endian }
+** (
+**	fmov	x0, d0
+**	fmov	x1, v0.d\[1\]
+** |
+**	fmov	x1, v0.d\[1\]
+**	fmov	x0, d0
+** )
+**	ret
+*/
+/*
+** fpr_to_gpr:	{ target aarch64_big_endian }
+** (
+**	fmov	x1, d0
+**	fmov	x0, v0.d\[1\]
+** |
+**	fmov	x0, v0.d\[1\]
+**	fmov	x1, d0
+** )
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+fpr_to_gpr (long double q0)
+{
+  register long double x0 asm ("x0");
+  x0 = q0;
+  asm volatile ("" :: "r" (x0));
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/movtf_4.c b/gcc/testsuite/gcc.target/aarch64/movtf_4.c
new file mode 100644
index 00000000000..5b7486c7887
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/movtf_4.c
@@ -0,0 +1,78 @@ 
+/* { dg-do assemble } */
+/* { dg-require-effective-target large_long_double } */
+/* { dg-options "-O -mtune=neoverse-v1 --save-temps" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#pragma GCC target "+sve"
+
+/*
+** fpr_to_fpr:
+**	mov	z0.d, z1.d
+**	ret
+*/
+long double __attribute__((arm_streaming_compatible))
+fpr_to_fpr (long double q0, long double q1)
+{
+  return q1;
+}
+
+/*
+** gpr_to_fpr:	{ target aarch64_little_endian }
+**	fmov	d0, x0
+**	fmov	v0.d\[1\], x1
+**	ret
+*/
+/*
+** gpr_to_fpr:	{ target aarch64_big_endian }
+**	fmov	d0, x1
+**	fmov	v0.d\[1\], x0
+**	ret
+*/
+long double __attribute__((arm_streaming_compatible))
+gpr_to_fpr ()
+{
+  register long double x0 asm ("x0");
+  asm volatile ("" : "=r" (x0));
+  return x0;
+}
+
+/*
+** zero_to_fpr:
+**	fmov	s0, wzr
+**	ret
+*/
+long double __attribute__((arm_streaming_compatible))
+zero_to_fpr ()
+{
+  return 0;
+}
+
+/*
+** fpr_to_gpr:	{ target aarch64_little_endian }
+** (
+**	fmov	x0, d0
+**	fmov	x1, v0.d\[1\]
+** |
+**	fmov	x1, v0.d\[1\]
+**	fmov	x0, d0
+** )
+**	ret
+*/
+/*
+** fpr_to_gpr:	{ target aarch64_big_endian }
+** (
+**	fmov	x1, d0
+**	fmov	x0, v0.d\[1\]
+** |
+**	fmov	x0, v0.d\[1\]
+**	fmov	x1, d0
+** )
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+fpr_to_gpr (long double q0)
+{
+  register long double x0 asm ("x0");
+  x0 = q0;
+  asm volatile ("" :: "r" (x0));
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/movti_3.c b/gcc/testsuite/gcc.target/aarch64/movti_3.c
new file mode 100644
index 00000000000..d846b09497e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/movti_3.c
@@ -0,0 +1,86 @@ 
+/* { dg-do assemble } */
+/* { dg-options "-O -mtune=neoverse-v1 --save-temps" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#pragma GCC target "+nosve"
+
+/*
+** fpr_to_fpr:
+**	sub	sp, sp, #16
+**	str	q1, \[sp\]
+**	ldr	q0, \[sp\]
+**	add	sp, sp, #?16
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+fpr_to_fpr (void)
+{
+  register __int128_t q0 asm ("q0");
+  register __int128_t q1 asm ("q1");
+  asm volatile ("" : "=w" (q1));
+  q0 = q1;
+  asm volatile ("" :: "w" (q0));
+}
+
+/*
+** gpr_to_fpr:	{ target aarch64_little_endian }
+**	fmov	d0, x0
+**	fmov	v0.d\[1\], x1
+**	ret
+*/
+/*
+** gpr_to_fpr:	{ target aarch64_big_endian }
+**	fmov	d0, x1
+**	fmov	v0.d\[1\], x0
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+gpr_to_fpr (__int128_t x0)
+{
+  register __int128_t q0 asm ("q0");
+  q0 = x0;
+  asm volatile ("" :: "w" (q0));
+}
+
+/*
+** zero_to_fpr:
+**	fmov	d0, xzr
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+zero_to_fpr ()
+{
+  register __int128_t q0 asm ("q0");
+  q0 = 0;
+  asm volatile ("" :: "w" (q0));
+}
+
+/*
+** fpr_to_gpr:	{ target aarch64_little_endian }
+** (
+**	fmov	x0, d0
+**	fmov	x1, v0.d\[1\]
+** |
+**	fmov	x1, v0.d\[1\]
+**	fmov	x0, d0
+** )
+**	ret
+*/
+/*
+** fpr_to_gpr:	{ target aarch64_big_endian }
+** (
+**	fmov	x1, d0
+**	fmov	x0, v0.d\[1\]
+** |
+**	fmov	x0, v0.d\[1\]
+**	fmov	x1, d0
+** )
+**	ret
+*/
+__int128_t __attribute__((arm_streaming_compatible))
+fpr_to_gpr ()
+{
+  register __int128_t q0 asm ("q0");
+  asm volatile ("" : "=w" (q0));
+  return q0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/movti_4.c b/gcc/testsuite/gcc.target/aarch64/movti_4.c
new file mode 100644
index 00000000000..01e5537e88f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/movti_4.c
@@ -0,0 +1,83 @@ 
+/* { dg-do assemble } */
+/* { dg-options "-O -mtune=neoverse-v1 --save-temps" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#pragma GCC target "+sve"
+
+/*
+** fpr_to_fpr:
+**	mov	z0\.d, z1\.d
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+fpr_to_fpr (void)
+{
+  register __int128_t q0 asm ("q0");
+  register __int128_t q1 asm ("q1");
+  asm volatile ("" : "=w" (q1));
+  q0 = q1;
+  asm volatile ("" :: "w" (q0));
+}
+
+/*
+** gpr_to_fpr:	{ target aarch64_little_endian }
+**	fmov	d0, x0
+**	fmov	v0.d\[1\], x1
+**	ret
+*/
+/*
+** gpr_to_fpr:	{ target aarch64_big_endian }
+**	fmov	d0, x1
+**	fmov	v0.d\[1\], x0
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+gpr_to_fpr (__int128_t x0)
+{
+  register __int128_t q0 asm ("q0");
+  q0 = x0;
+  asm volatile ("" :: "w" (q0));
+}
+
+/*
+** zero_to_fpr:
+**	fmov	d0, xzr
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+zero_to_fpr ()
+{
+  register __int128_t q0 asm ("q0");
+  q0 = 0;
+  asm volatile ("" :: "w" (q0));
+}
+
+/*
+** fpr_to_gpr:	{ target aarch64_little_endian }
+** (
+**	fmov	x0, d0
+**	fmov	x1, v0.d\[1\]
+** |
+**	fmov	x1, v0.d\[1\]
+**	fmov	x0, d0
+** )
+**	ret
+*/
+/*
+** fpr_to_gpr:	{ target aarch64_big_endian }
+** (
+**	fmov	x1, d0
+**	fmov	x0, v0.d\[1\]
+** |
+**	fmov	x0, v0.d\[1\]
+**	fmov	x1, d0
+** )
+**	ret
+*/
+__int128_t __attribute__((arm_streaming_compatible))
+fpr_to_gpr ()
+{
+  register __int128_t q0 asm ("q0");
+  asm volatile ("" : "=w" (q0));
+  return q0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/movv16qi_4.c b/gcc/testsuite/gcc.target/aarch64/movv16qi_4.c
new file mode 100644
index 00000000000..f0f8cb95750
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/movv16qi_4.c
@@ -0,0 +1,82 @@ 
+/* { dg-do assemble } */
+/* { dg-options "-O -mtune=neoverse-v1 --save-temps" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#pragma GCC target "+nosve"
+
+typedef unsigned char v16qi __attribute__((vector_size(16)));
+
+/*
+** fpr_to_fpr:
+**	sub	sp, sp, #16
+**	str	q1, \[sp\]
+**	ldr	q0, \[sp\]
+**	add	sp, sp, #?16
+**	ret
+*/
+v16qi __attribute__((arm_streaming_compatible))
+fpr_to_fpr (v16qi q0, v16qi q1)
+{
+  return q1;
+}
+
+/*
+** gpr_to_fpr:	{ target aarch64_little_endian }
+**	fmov	d0, x0
+**	fmov	v0.d\[1\], x1
+**	ret
+*/
+/*
+** gpr_to_fpr:	{ target aarch64_big_endian }
+**	fmov	d0, x1
+**	fmov	v0.d\[1\], x0
+**	ret
+*/
+v16qi __attribute__((arm_streaming_compatible))
+gpr_to_fpr ()
+{
+  register v16qi x0 asm ("x0");
+  asm volatile ("" : "=r" (x0));
+  return x0;
+}
+
+/*
+** zero_to_fpr:
+**	fmov	d0, xzr
+**	ret
+*/
+v16qi __attribute__((arm_streaming_compatible))
+zero_to_fpr ()
+{
+  return (v16qi) {};
+}
+
+/*
+** fpr_to_gpr:	{ target aarch64_little_endian }
+** (
+**	umov	x0, v0.d\[0\]
+**	fmov	x1, v0.d\[1\]
+** |
+**	fmov	x1, v0.d\[1\]
+**	umov	x0, v0.d\[0\]
+** )
+**	ret
+*/
+/*
+** fpr_to_gpr:	{ target aarch64_big_endian }
+** (
+**	umov	x1, v0.d\[0\]
+**	fmov	x0, v0.d\[1\]
+** |
+**	fmov	x0, v0.d\[1\]
+**	umov	x1, v0.d\[0\]
+** )
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+fpr_to_gpr (v16qi q0)
+{
+  register v16qi x0 asm ("x0");
+  x0 = q0;
+  asm volatile ("" :: "r" (x0));
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/movv16qi_5.c b/gcc/testsuite/gcc.target/aarch64/movv16qi_5.c
new file mode 100644
index 00000000000..db59f01376e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/movv16qi_5.c
@@ -0,0 +1,79 @@ 
+/* { dg-do assemble } */
+/* { dg-options "-O -mtune=neoverse-v1 --save-temps" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#pragma GCC target "+sve"
+
+typedef unsigned char v16qi __attribute__((vector_size(16)));
+
+/*
+** fpr_to_fpr:
+**	mov	z0.d, z1.d
+**	ret
+*/
+v16qi __attribute__((arm_streaming_compatible))
+fpr_to_fpr (v16qi q0, v16qi q1)
+{
+  return q1;
+}
+
+/*
+** gpr_to_fpr:	{ target aarch64_little_endian }
+**	fmov	d0, x0
+**	fmov	v0.d\[1\], x1
+**	ret
+*/
+/*
+** gpr_to_fpr:	{ target aarch64_big_endian }
+**	fmov	d0, x1
+**	fmov	v0.d\[1\], x0
+**	ret
+*/
+v16qi __attribute__((arm_streaming_compatible))
+gpr_to_fpr ()
+{
+  register v16qi x0 asm ("x0");
+  asm volatile ("" : "=r" (x0));
+  return x0;
+}
+
+/*
+** zero_to_fpr:
+**	fmov	d0, xzr
+**	ret
+*/
+v16qi __attribute__((arm_streaming_compatible))
+zero_to_fpr ()
+{
+  return (v16qi) {};
+}
+
+/*
+** fpr_to_gpr:	{ target aarch64_little_endian }
+** (
+**	umov	x0, v0.d\[0\]
+**	fmov	x1, v0.d\[1\]
+** |
+**	fmov	x1, v0.d\[1\]
+**	umov	x0, v0.d\[0\]
+** )
+**	ret
+*/
+/*
+** fpr_to_gpr:	{ target aarch64_big_endian }
+** (
+**	umov	x1, v0.d\[0\]
+**	fmov	x0, v0.d\[1\]
+** |
+**	fmov	x0, v0.d\[1\]
+**	umov	x1, v0.d\[0\]
+** )
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+fpr_to_gpr (v16qi q0)
+{
+  register v16qi x0 asm ("x0");
+  x0 = q0;
+  asm volatile ("" :: "r" (x0));
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/movv8qi_4.c b/gcc/testsuite/gcc.target/aarch64/movv8qi_4.c
new file mode 100644
index 00000000000..49eb2d31910
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/movv8qi_4.c
@@ -0,0 +1,55 @@ 
+/* { dg-do assemble } */
+/* { dg-options "-O -mtune=neoverse-v1 --save-temps" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#pragma GCC target "+nosve"
+
+typedef unsigned char v8qi __attribute__((vector_size(8)));
+
+/*
+** fpr_to_fpr:
+**	fmov	d0, d1
+**	ret
+*/
+v8qi __attribute__((arm_streaming_compatible))
+fpr_to_fpr (v8qi q0, v8qi q1)
+{
+  return q1;
+}
+
+/*
+** gpr_to_fpr:
+**	fmov	d0, x0
+**	ret
+*/
+v8qi __attribute__((arm_streaming_compatible))
+gpr_to_fpr ()
+{
+  register v8qi x0 asm ("x0");
+  asm volatile ("" : "=r" (x0));
+  return x0;
+}
+
+/*
+** zero_to_fpr:
+**	fmov	d0, xzr
+**	ret
+*/
+v8qi __attribute__((arm_streaming_compatible))
+zero_to_fpr ()
+{
+  return (v8qi) {};
+}
+
+/*
+** fpr_to_gpr:
+**	umov	x0, v0\.d\[0\]
+**	ret
+*/
+void __attribute__((arm_streaming_compatible))
+fpr_to_gpr (v8qi q0)
+{
+  register v8qi x0 asm ("x0");
+  x0 = q0;
+  asm volatile ("" :: "r" (x0));
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/arm_neon_1.c b/gcc/testsuite/gcc.target/aarch64/sme/arm_neon_1.c
new file mode 100644
index 00000000000..4a526e7d125
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/arm_neon_1.c
@@ -0,0 +1,13 @@ 
+// { dg-options "" }
+
+#include <arm_neon.h>
+
+#pragma GCC target "+nosme"
+
+// { dg-error {inlining failed.*'vaddq_s32'} "" { target *-*-* } 0 }
+
+int32x4_t __attribute__((arm_streaming_compatible))
+foo (int32x4_t x, int32x4_t y)
+{
+  return vaddq_s32 (x, y);
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/arm_neon_2.c b/gcc/testsuite/gcc.target/aarch64/sme/arm_neon_2.c
new file mode 100644
index 00000000000..e7183caa6f8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/arm_neon_2.c
@@ -0,0 +1,11 @@ 
+// { dg-options "" }
+
+#include <arm_neon.h>
+
+// { dg-error {inlining failed.*'vaddq_s32'} "" { target *-*-* } 0 }
+
+int32x4_t __attribute__((arm_streaming_compatible))
+foo (int32x4_t x, int32x4_t y)
+{
+  return vaddq_s32 (x, y);
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/arm_neon_3.c b/gcc/testsuite/gcc.target/aarch64/sme/arm_neon_3.c
new file mode 100644
index 00000000000..e11570e41d1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/arm_neon_3.c
@@ -0,0 +1,11 @@ 
+// { dg-options "" }
+
+#include <arm_neon.h>
+
+// { dg-error {inlining failed.*'vaddq_s32'} "" { target *-*-* } 0 }
+
+int32x4_t __attribute__((arm_streaming))
+foo (int32x4_t x, int32x4_t y)
+{
+  return vaddq_s32 (x, y);
+}