From patchwork Fri Aug  5 12:55:02 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: "Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com>
X-Patchwork-Id: 401
Return-Path: <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a05:6a10:20da:b0:2d3:3019:e567 with SMTP id n26csp175076pxc;
        Fri, 5 Aug 2022 05:55:55 -0700 (PDT)
X-Google-Smtp-Source: 
 AA6agR6guc8OcBA51pDBKSziHL/Kz1Fr1I/v2n4xS2I8pXEZoN8cN8W+6th9pO+R5sy5bbvp86P1
X-Received: by 2002:a05:6402:2b88:b0:43a:6c58:6c64 with SMTP id
 fj8-20020a0564022b8800b0043a6c586c64mr6651566edb.348.1659704155387;
        Fri, 05 Aug 2022 05:55:55 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1659704155; cv=none;
        d=google.com; s=arc-20160816;
        b=JiMlyLJL0rnwkWraFJm0CQ9sKHAa0kzxYh9mleGAyPp/A8QiSYvtfMr+iPlseDOk4r
         S9T3fPJgYpCKOJo3X3qUHa3nL1G3SRc0kw5m8WQTF6Yws54z8XOVRUUu+BnFgbi4tQCD
         EczCjfkMjMLQO3v8txfhyBBsRwlRomP9i9erhUB3K16IiA98FmuytKBaonu9+eHUiRVy
         nqv9i5Z5hs3yiirHFB+p6vaWwtNSJl0/rh65m1teG3sy6MGUtALqcWaxOd6lVodAZ/17
         EZcADlaykxH5NQ58tQixfirx86bOxBHLwolBC7rpA1UKzweki9UX0kZqIJqNSS3oo9xn
         u/Gw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=sender:errors-to:cc:reply-to:from:list-subscribe:list-help
         :list-post:list-archive:list-unsubscribe:list-id:precedence
         :in-reply-to:references:to:content-language:subject:user-agent
         :mime-version:date:message-id:dmarc-filter:delivered-to
         :dkim-signature:dkim-filter;
        bh=B+kbS9OP+XO/2R0NXPswln9W/MU+vpYo1eBnAr1o4Uw=;
        b=P5Cg9mHm50/48PAx5dNrJXOn5+xZSRukgUKsufPgS6eobNDS/+uiPyP6myvcLY3yIL
         jGaceMFGmUEiukYX5F/8pVrD+g30Eby1WN7lqPLMAlOUNM8Vgch9XyVwKVtt0nnvqAUB
         UWh/n7kpGwivW6GWGXhKrdvNNpL3p41bgosBep5sU5GQKgQiDI/+9Yz05UmThkhIVVCs
         81c8GknSK3yOG845VZU8aEonC+W8MJqSXyyAUMTFf7zB7mKpClwA0Vt27MDbb0uqNqE0
         rWIkEQaUpvVYhUgKopNH4OWQgNaPZUD78Uxtx2fwHfO0tm6qqn1nez7qXbb/FAAJOLQO
         fy3w==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@gcc.gnu.org header.s=default header.b=gY2ec2Gx;
       spf=pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as
 permitted sender)
 smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org";
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org
Received: from sourceware.org (server2.sourceware.org. [8.43.85.97])
        by mx.google.com with ESMTPS id
 h13-20020a05640250cd00b0043ec2822c33si3818185edb.168.2022.08.05.05.55.55
        for <ouuuleilei@gmail.com>
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Fri, 05 Aug 2022 05:55:55 -0700 (PDT)
Received-SPF: pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as
 permitted sender) client-ip=8.43.85.97;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@gcc.gnu.org header.s=default header.b=gY2ec2Gx;
       spf=pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as
 permitted sender)
 smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org";
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 2E1F33857BB2
	for <ouuuleilei@gmail.com>; Fri,  5 Aug 2022 12:55:54 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2E1F33857BB2
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1659704154;
	bh=B+kbS9OP+XO/2R0NXPswln9W/MU+vpYo1eBnAr1o4Uw=;
	h=Date:Subject:To:References:In-Reply-To:List-Id:List-Unsubscribe:
	 List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc:
	 From;
	b=gY2ec2GxEVmrprqk/mYPM9XW94CpdmI+GsXcalJwB6kkflr3jLB2CMss4Eo4/8uwJ
	 g5/W5IKK5Xgh3xe4AwUE++YMHUy240rjt3CMH2x8qjSgs1j0r/IUHPyOkjXH0v33Lx
	 iXUrZF9RtG4FoHrt1MMx4Kv2zpFfbYiZizIp9ZgY=
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
 by sourceware.org (Postfix) with ESMTP id B0A873858C53
 for <gcc-patches@gcc.gnu.org>; Fri,  5 Aug 2022 12:55:09 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B0A873858C53
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B620B106F;
 Fri,  5 Aug 2022 05:55:09 -0700 (PDT)
Received: from [10.57.14.36] (unknown [10.57.14.36])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 209933F73B;
 Fri,  5 Aug 2022 05:55:07 -0700 (PDT)
Message-ID: <317d0d74-e7e1-05e8-45d3-98bbc929a922@arm.com>
Date: Fri, 5 Aug 2022 13:55:02 +0100
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
 Thunderbird/91.11.0
Subject: [PATCH 2/4]aarch64: Change aarch64_expand_vector_init to use
 rtx_vector_builder
Content-Language: en-US
To: gcc-patches@gcc.gnu.org
References: <95d2de77-5b68-6d0b-ac99-ac1ca28835e2@arm.com>
In-Reply-To: <95d2de77-5b68-6d0b-ac99-ac1ca28835e2@arm.com>
X-Spam-Status: No, score=-22.1 required=5.0 tests=BAYES_00, BODY_8BITS,
 GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY,
 KAM_LOTSOFHASH, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-Patchwork-Original-From: "Andre Vieira \(lists\) via Gcc-patches"
 <gcc-patches@gcc.gnu.org>
From: "Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com>
Reply-To: "Andre Vieira \(lists\)" <andre.simoesdiasvieira@arm.com>
Cc: Richard Sandiford <richard.sandiford@arm.com>
Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org
Sender: "Gcc-patches" <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org>
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1740325944288025323?=
X-GMAIL-MSGID: =?utf-8?q?1740325944288025323?=

Hi,

This patch changes aarch64_expand_vector_init to use rtx_vector_builder,
exploiting it's internal pattern detection to find 'dup' patterns.

Bootstrapped and regression tested on aarch64-none-linux-gnu.

Is this OK for trunk or should we wait for the rest of the series?

gcc/ChangeLog:
2022-08-05  Andre Vieira  <andre.simoesdiasvieira@arm.com>

         * config/aarch64/aarch64.cc (aarch64_vec_duplicate): New.
          (aarch64_expand_vector_init): Make the existing variant construct
          a rtx_vector_builder from the list of elements and use this to 
detect
          duplicate patterns.

gcc/testesuite/ChangeLog:
2022-08-05  Andre Vieira  <andre.simoesdiasvieira@arm.com>

         * gcc.target/aarch64/ldp_stp_16.c: Modify to reflect code change.

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 4b486aeea90ea2afb9cdd96a4dbe15c5bb2abd7a..a08043e18d609e258ebfe033875201163d129aba 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -305,6 +305,7 @@ static machine_mode aarch64_simd_container_mode (scalar_mode, poly_int64);
 static bool aarch64_print_address_internal (FILE*, machine_mode, rtx,
 					    aarch64_addr_query_type);
 static HOST_WIDE_INT aarch64_clamp_to_uimm12_shift (HOST_WIDE_INT val);
+static void aarch64_expand_vector_init (rtx, rtx_vector_builder&);
 
 /* The processor for which instructions should be scheduled.  */
 enum aarch64_processor aarch64_tune = cortexa53;
@@ -21804,55 +21805,96 @@ aarch64_simd_make_constant (rtx vals)
     return NULL_RTX;
 }
 
+static void
+aarch64_vec_duplicate (rtx target, machine_mode mode, machine_mode element_mode,
+		       int narrow_n_elts)
+{
+  poly_uint64 size = narrow_n_elts * GET_MODE_BITSIZE (element_mode);
+  scalar_mode i_mode = int_mode_for_size (size, 0).require ();
+  machine_mode o_mode;
+  if (aarch64_sve_mode_p (mode))
+    o_mode = aarch64_full_sve_mode (i_mode).require ();
+  else
+    o_mode
+      = aarch64_simd_container_mode (i_mode,
+				     GET_MODE_BITSIZE (mode));
+  rtx input = simplify_gen_subreg (i_mode, target, mode, 0);
+  rtx output = simplify_gen_subreg (o_mode, target, mode, 0);
+  aarch64_emit_move (output, gen_vec_duplicate (o_mode, input));
+}
+
+
 /* Expand a vector initialisation sequence, such that TARGET is
    initialised to contain VALS.  */
 
 void
 aarch64_expand_vector_init (rtx target, rtx vals)
 {
-  machine_mode mode = GET_MODE (target);
-  scalar_mode inner_mode = GET_MODE_INNER (mode);
   /* The number of vector elements.  */
   int n_elts = XVECLEN (vals, 0);
-  /* The number of vector elements which are not constant.  */
-  int n_var = 0;
-  rtx any_const = NULL_RTX;
+  machine_mode mode = GET_MODE (target);
+  scalar_mode inner_mode = GET_MODE_INNER (mode);
   /* The first element of vals.  */
   rtx v0 = XVECEXP (vals, 0, 0);
-  bool all_same = true;
 
   /* This is a special vec_init<M><N> where N is not an element mode but a
      vector mode with half the elements of M.  We expect to find two entries
      of mode N in VALS and we must put their concatentation into TARGET.  */
-  if (XVECLEN (vals, 0) == 2 && VECTOR_MODE_P (GET_MODE (XVECEXP (vals, 0, 0))))
+  if (n_elts == 2
+      && VECTOR_MODE_P (GET_MODE (v0)))
     {
-      machine_mode narrow_mode = GET_MODE (XVECEXP (vals, 0, 0));
+      machine_mode narrow_mode = GET_MODE (v0);
       gcc_assert (GET_MODE_INNER (narrow_mode) == inner_mode
 		  && known_eq (GET_MODE_SIZE (mode),
 			       2 * GET_MODE_SIZE (narrow_mode)));
-      emit_insn (gen_aarch64_vec_concat (narrow_mode, target,
-					 XVECEXP (vals, 0, 0),
+      emit_insn (gen_aarch64_vec_concat (narrow_mode, target, v0,
 					 XVECEXP (vals, 0, 1)));
      return;
    }
 
-  /* Count the number of variable elements to initialise.  */
+  rtx_vector_builder builder (mode, n_elts, 1);
   for (int i = 0; i < n_elts; ++i)
+    builder.quick_push (XVECEXP (vals, 0, i));
+  builder.finalize ();
+
+  aarch64_expand_vector_init (target, builder);
+}
+
+static void
+aarch64_expand_vector_init (rtx target, rtx_vector_builder &v)
+{
+  machine_mode mode = GET_MODE (target);
+  scalar_mode inner_mode = GET_MODE_INNER (mode);
+  /* The number of vector elements which are not constant.  */
+  unsigned n_var = 0;
+  rtx any_const = NULL_RTX;
+  /* The first element of vals.  */
+  rtx v0 = v.elt (0);
+  /* Get the number of elements to insert into an Advanced SIMD vector.
+     If we have more than one element per pattern then we use the constant
+     number of elements in a full vector.
+     If we only have one element per pattern we use the number of patterns as
+     this may be lower than the number of elements in a full vector, which
+     means they repeat and we should use a duplicate of the smaller vector.  */
+  unsigned n_elts
+    = v.nelts_per_pattern () == 1 ? v.npatterns ()
+				  : v.full_nelts ().coeffs[0];
+
+  /* Count the number of variable elements to initialise.  */
+  for (unsigned i = 0; i < n_elts ; ++i)
     {
-      rtx x = XVECEXP (vals, 0, i);
+      rtx x = v.elt (i);
       if (!(CONST_INT_P (x) || CONST_DOUBLE_P (x)))
 	++n_var;
       else
 	any_const = x;
-
-      all_same &= rtx_equal_p (x, v0);
     }
 
   /* No variable elements, hand off to aarch64_simd_make_constant which knows
      how best to handle this.  */
   if (n_var == 0)
     {
-      rtx constant = aarch64_simd_make_constant (vals);
+      rtx constant = aarch64_simd_make_constant (v.build ());
       if (constant != NULL_RTX)
 	{
 	  emit_move_insn (target, constant);
@@ -21861,7 +21903,7 @@ aarch64_expand_vector_init (rtx target, rtx vals)
     }
 
   /* Splat a single non-constant element if we can.  */
-  if (all_same)
+  if (n_elts == 1)
     {
       rtx x = copy_to_mode_reg (inner_mode, v0);
       aarch64_emit_move (target, gen_vec_duplicate (mode, x));
@@ -21879,14 +21921,15 @@ aarch64_expand_vector_init (rtx target, rtx vals)
      and matches[X][1] with the count of duplicate elements (if X is the
      earliest element which has duplicates).  */
 
-  if (n_var == n_elts && n_elts <= 16)
+  if (n_var == n_elts)
     {
-      int matches[16][2] = {0};
-      for (int i = 0; i < n_elts; i++)
+      gcc_assert (n_elts <= 16);
+      unsigned matches[16][2] = {0};
+      for (unsigned i = 0; i < n_elts; i++)
 	{
-	  for (int j = 0; j <= i; j++)
+	  for (unsigned j = 0; j <= i; j++)
 	    {
-	      if (rtx_equal_p (XVECEXP (vals, 0, i), XVECEXP (vals, 0, j)))
+	      if (rtx_equal_p (v.elt (i), v.elt (j)))
 		{
 		  matches[i][0] = j;
 		  matches[j][1]++;
@@ -21894,9 +21937,9 @@ aarch64_expand_vector_init (rtx target, rtx vals)
 		}
 	    }
 	}
-      int maxelement = 0;
-      int maxv = 0;
-      for (int i = 0; i < n_elts; i++)
+      unsigned maxelement = 0;
+      unsigned maxv = 0;
+      for (unsigned i = 0; i < n_elts; i++)
 	if (matches[i][1] > maxv)
 	  {
 	    maxelement = i;
@@ -21915,8 +21958,8 @@ aarch64_expand_vector_init (rtx target, rtx vals)
 		  || inner_mode == E_DFmode))
 
 	    {
-	      rtx x0 = XVECEXP (vals, 0, 0);
-	      rtx x1 = XVECEXP (vals, 0, 1);
+	      rtx x0 = v.elt (0);
+	      rtx x1 = v.elt (1);
 	      /* Combine can pick up this case, but handling it directly
 		 here leaves clearer RTL.
 
@@ -21939,24 +21982,26 @@ aarch64_expand_vector_init (rtx target, rtx vals)
 	     vector register.  For big-endian we want that position to hold
 	     the last element of VALS.  */
 	  maxelement = BYTES_BIG_ENDIAN ? n_elts - 1 : 0;
-	  rtx x = copy_to_mode_reg (inner_mode, XVECEXP (vals, 0, maxelement));
+	  rtx x = copy_to_mode_reg (inner_mode, v.elt (maxelement));
 	  aarch64_emit_move (target, lowpart_subreg (mode, x, inner_mode));
 	}
       else
 	{
-	  rtx x = copy_to_mode_reg (inner_mode, XVECEXP (vals, 0, maxelement));
+	  rtx x = copy_to_mode_reg (inner_mode, v.elt (maxelement));
 	  aarch64_emit_move (target, gen_vec_duplicate (mode, x));
 	}
 
       /* Insert the rest.  */
-      for (int i = 0; i < n_elts; i++)
+      for (unsigned i = 0; i < n_elts; i++)
 	{
-	  rtx x = XVECEXP (vals, 0, i);
+	  rtx x = v.elt (i);
 	  if (matches[i][0] == maxelement)
 	    continue;
 	  x = copy_to_mode_reg (inner_mode, x);
 	  emit_insn (GEN_FCN (icode) (target, x, GEN_INT (i)));
 	}
+	if (!known_eq (v.full_nelts (), n_elts))
+	  aarch64_vec_duplicate (target, mode, GET_MODE (v0), n_elts);
       return;
     }
 
@@ -21965,19 +22010,19 @@ aarch64_expand_vector_init (rtx target, rtx vals)
      can.  */
   if (n_var != n_elts)
     {
-      rtx copy = copy_rtx (vals);
+      rtx copy = v.build ();
 
       /* Load constant part of vector.  We really don't care what goes into the
 	 parts we will overwrite, but we're more likely to be able to load the
 	 constant efficiently if it has fewer, larger, repeating parts
 	 (see aarch64_simd_valid_immediate).  */
-      for (int i = 0; i < n_elts; i++)
+      for (unsigned i = 0; i < n_elts; i++)
 	{
-	  rtx x = XVECEXP (vals, 0, i);
+	  rtx x = XVECEXP (copy, 0, i);
 	  if (CONST_INT_P (x) || CONST_DOUBLE_P (x))
 	    continue;
 	  rtx subst = any_const;
-	  for (int bit = n_elts / 2; bit > 0; bit /= 2)
+	  for (unsigned bit = n_elts / 2; bit > 0; bit /= 2)
 	    {
 	      /* Look in the copied vector, as more elements are const.  */
 	      rtx test = XVECEXP (copy, 0, i ^ bit);
@@ -21989,18 +22034,21 @@ aarch64_expand_vector_init (rtx target, rtx vals)
 	    }
 	  XVECEXP (copy, 0, i) = subst;
 	}
+      gcc_assert (GET_MODE (target) == GET_MODE (copy));
       aarch64_expand_vector_init (target, copy);
     }
 
   /* Insert the variable lanes directly.  */
-  for (int i = 0; i < n_elts; i++)
+  for (unsigned i = 0; i < n_elts; i++)
     {
-      rtx x = XVECEXP (vals, 0, i);
+      rtx x = v.elt (i);
       if (CONST_INT_P (x) || CONST_DOUBLE_P (x))
 	continue;
       x = copy_to_mode_reg (inner_mode, x);
       emit_insn (GEN_FCN (icode) (target, x, GEN_INT (i)));
     }
+  if (!known_eq (v.full_nelts (), n_elts))
+    aarch64_vec_duplicate (target, mode, inner_mode, n_elts);
 }
 
 /* Emit RTL corresponding to:
diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_stp_16.c b/gcc/testsuite/gcc.target/aarch64/ldp_stp_16.c
index 8ab117c4dcd7a731abc7e1b039e1faf0dfa09a5d..b307d2791824dd9c30200931452b2636708b5035 100644
--- a/gcc/testsuite/gcc.target/aarch64/ldp_stp_16.c
+++ b/gcc/testsuite/gcc.target/aarch64/ldp_stp_16.c
@@ -96,8 +96,8 @@ CONS2_FN (4, float);
 
 /*
 ** cons2_8_float:
-**	dup	v([0-9]+)\.4s, .*
-**	...
+**	ins	v0\.s\[1\], v1\.s\[0\]
+**	dup	v([0-9]+)\.2d, v0\.d\[0\]
 **	stp	q\1, q\1, \[x0\]
 **	stp	q\1, q\1, \[x0, #?32\]
 **	ret
diff --git a/gcc/testsuite/gcc.target/aarch64/vect_init.c b/gcc/testsuite/gcc.target/aarch64/vect_init.c
new file mode 100644
index 0000000000000000000000000000000000000000..546e44e96f4db60d289b4bc0ebfecbe18c81b4cc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect_init.c
@@ -0,0 +1,144 @@
+#include <arm_neon.h>
+
+/*
+** int32_0:
+**	fmov	s0, w0
+**	ins	v0.s\[1\], w1
+**	dup	v0.2d, v0.d\[0\]
+**	ret
+*/
+
+int32x4_t int32_0 (int a, int b)
+{
+  int32x4_t v = {a, b, a, b};
+  return v;
+}
+/*
+** int32_1:
+**	dup	v0.4s, w0
+**	ret
+*/
+
+int32x4_t int32_1 (int a)
+{
+  int32x4_t v = {a, a, a, a};
+  return v;
+}
+
+/*
+** int16_0:
+**	sxth	w0, w0
+**	fmov	s0, w0
+**	ins	v0.h\[1\], w1
+**	ins	v0.h\[2\], w2
+**	ins	v0.h\[3\], w3
+**	dup	v0.2d, v0.d\[0\]
+**	ret
+*/
+
+int16x8_t int16_0 (int16_t a, int16_t b, int16_t c, int16_t d)
+{
+  int16x8_t v = {a, b, c, d,
+		 a, b, c, d};
+  return v;
+}
+
+/*
+** int16_1:
+**	sxth	w0, w0
+**	fmov	s0, w0
+**	ins	v0.h\[1\], w1
+**	dup	v0.4s, v0.s\[0\]
+**	ret
+*/
+
+int16x8_t int16_1 (int16_t a, int16_t b)
+{
+  int16x8_t v = {a, b, a, b,
+		 a, b, a, b};
+  return v;
+}
+
+/*
+** int16_2:
+**	dup	v0.8h, w0
+**	ret
+*/
+
+int16x8_t int16_2 (int16_t a)
+{
+  int16x8_t v = {a, a, a, a,
+		 a, a, a, a};
+  return v;
+}
+
+/*
+** int8_0:
+**	sxtb	w0, w0
+**	fmov	s0, w0
+**	ins	v0.b\[1\], w1
+**	ins	v0.b\[2\], w2
+**	ins	v0.b\[3\], w3
+**	ins	v0.b\[4\], w4
+**	ins	v0.b\[5\], w5
+**	ins	v0.b\[6\], w6
+**	ins	v0.b\[7\], w7
+**	dup	v0.2d, v0.d\[0\]
+**	ret
+*/
+
+int8x16_t int8_0 (int8_t a, int8_t b, int8_t c, int8_t d, int8_t e, int8_t f,
+		   int8_t g, int8_t h)
+{
+  int8x16_t v = {a, b, c, d, e, f, g, h,
+                 a, b, c, d, e, f, g, h};
+  return v;
+}
+
+/*
+** int8_1:
+**	sxtb	w0, w0
+**	fmov	s0, w0
+**	ins	v0.b\[1\], w1
+**	ins	v0.b\[2\], w2
+**	ins	v0.b\[3\], w3
+**	dup	v0.4s, v0.s\[0\]
+**	ret
+*/
+
+int8x16_t int8_1 (int8_t a, int8_t b, int8_t c, int8_t d)
+{
+  int8x16_t v = {a, b, c, d, a, b, c, d,
+                 a, b, c, d, a, b, c, d};
+  return v;
+}
+
+/*
+** int8_2:
+**	sxtb	w0, w0
+**	fmov	s0, w0
+**	ins	v0.b\[1\], w1
+**	dup	v0.8h, v0.h\[0\]
+**	ret
+*/
+
+int8x16_t int8_2 (int8_t a, int8_t b)
+{
+  int8x16_t v = {a, b, a, b, a, b, a, b,
+                 a, b, a, b, a, b, a, b};
+  return v;
+}
+
+/*
+** int8_3:
+**	dup	v0.16b, w0
+**	ret
+*/
+
+int8x16_t int8_3 (int8_t a)
+{
+  int8x16_t v = {a, a, a, a, a, a, a, a,
+                 a, a, a, a, a, a, a, a};
+  return v;
+}
+