RISC-V: Implement vector "average" autovec pattern.

Message ID 163c273d-3c01-8ece-21a5-b6ce88174ac0@gmail.com
State Unresolved
Headers
Series RISC-V: Implement vector "average" autovec pattern. |

Checks

Context Check Description
snail/gcc-patch-check warning Git am fail log

Commit Message

Robin Dapp Aug. 1, 2023, 2:31 p.m. UTC
  Hi,

this patch adds vector average patterns

 op[0] = (narrow) ((wide) op[1] + (wide) op[2]) >> 1;
 op[0] = (narrow) ((wide) op[1] + (wide) op[2] + 1) >> 1;

If there is no direct support, the vectorizer can synthesize the patterns
but, presumably due to lack of narrowing operation support, won't try
a narrowing shift.  Therefore, this patch implements the expanders instead.

A synthesized pattern results in e.g:
	vsrl.vi	v2,v1,1
	vsrl.vi	v4,v3,1
	vand.vv	v1,v1,v3
	vadd.vv	v2,v2,v4
	vand.vi	v1,v1,1
	vadd.vv	v1,v2,v1

With this patch we generate:
	vwadd.vv	v2,v4,v1
	vadd.vi		v2,1
	vnsrl.wi	v2,v2,1

We manage to recover (i.e. create the latter sequence) for signed types
but not for unsigned.  I figured that offering both patterns might be the
safe thing to do but open to leaving the signed one out.  In the long
term we'd want full vectorizer support for this I suppose.

Regards
 Robin

gcc/ChangeLog:

	* config/riscv/autovec.md (<u>avg<v_double_trunc>3_floor):
	Implement expander.
	(<u>avg<v_double_trunc>3_ceil): Ditto.
	* config/riscv/vector-iterators.md (ashiftrt): New iterator.
	(ASHIFTRT): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/autovec/vec-avg-run.c: New test.
	* gcc.target/riscv/rvv/autovec/vec-avg-rv32gcv.c: New test.
	* gcc.target/riscv/rvv/autovec/vec-avg-rv64gcv.c: New test.
	* gcc.target/riscv/rvv/autovec/vec-avg-template.h: New test.
---
 gcc/config/riscv/autovec.md                   | 66 ++++++++++++++
 gcc/config/riscv/vector-iterators.md          |  5 ++
 .../riscv/rvv/autovec/vec-avg-run.c           | 85 +++++++++++++++++++
 .../riscv/rvv/autovec/vec-avg-rv32gcv.c       | 10 +++
 .../riscv/rvv/autovec/vec-avg-rv64gcv.c       | 10 +++
 .../riscv/rvv/autovec/vec-avg-template.h      | 33 +++++++
 6 files changed, 209 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-run.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-rv32gcv.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-rv64gcv.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-template.h
  

Comments

juzhe.zhong@rivai.ai Aug. 2, 2023, 2:03 p.m. UTC | #1
I am concerning:

1. How do you model round to +Inf (avg_floor) and round to -Inf (avg_ceil) ?
2. Is it possible we could use vaadd[u] to model avg ?



juzhe.zhong@rivai.ai
 
From: Robin Dapp
Date: 2023-08-01 22:31
To: gcc-patches; palmer; Kito Cheng; juzhe.zhong@rivai.ai; jeffreyalaw
CC: rdapp.gcc
Subject: [PATCH] RISC-V: Implement vector "average" autovec pattern.
Hi,
 
this patch adds vector average patterns
 
op[0] = (narrow) ((wide) op[1] + (wide) op[2]) >> 1;
op[0] = (narrow) ((wide) op[1] + (wide) op[2] + 1) >> 1;
 
If there is no direct support, the vectorizer can synthesize the patterns
but, presumably due to lack of narrowing operation support, won't try
a narrowing shift.  Therefore, this patch implements the expanders instead.
 
A synthesized pattern results in e.g:
vsrl.vi v2,v1,1
vsrl.vi v4,v3,1
vand.vv v1,v1,v3
vadd.vv v2,v2,v4
vand.vi v1,v1,1
vadd.vv v1,v2,v1
 
With this patch we generate:
vwadd.vv v2,v4,v1
vadd.vi v2,1
vnsrl.wi v2,v2,1
 
We manage to recover (i.e. create the latter sequence) for signed types
but not for unsigned.  I figured that offering both patterns might be the
safe thing to do but open to leaving the signed one out.  In the long
term we'd want full vectorizer support for this I suppose.
 
Regards
Robin
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (<u>avg<v_double_trunc>3_floor):
Implement expander.
(<u>avg<v_double_trunc>3_ceil): Ditto.
* config/riscv/vector-iterators.md (ashiftrt): New iterator.
(ASHIFTRT): Ditto.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/vec-avg-run.c: New test.
* gcc.target/riscv/rvv/autovec/vec-avg-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/vec-avg-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/vec-avg-template.h: New test.
---
gcc/config/riscv/autovec.md                   | 66 ++++++++++++++
gcc/config/riscv/vector-iterators.md          |  5 ++
.../riscv/rvv/autovec/vec-avg-run.c           | 85 +++++++++++++++++++
.../riscv/rvv/autovec/vec-avg-rv32gcv.c       | 10 +++
.../riscv/rvv/autovec/vec-avg-rv64gcv.c       | 10 +++
.../riscv/rvv/autovec/vec-avg-template.h      | 33 +++++++
6 files changed, 209 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-run.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-rv32gcv.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-rv64gcv.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-template.h
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 7b784437c7e..23d3c2feaff 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1752,3 +1752,69 @@ (define_expand "mask_len_fold_left_plus_<mode>"
    riscv_vector::reduction_type::MASK_LEN_FOLD_LEFT);
   DONE;
})
+
+;; -------------------------------------------------------------------------
+;; ---- [INT] Average.
+;; -------------------------------------------------------------------------
+;; Implements the following "average" patterns:
+;; floor:
+;;  op[0] = (narrow) ((wide) op[1] + (wide) op[2]) >> 1;
+;; ceil:
+;;  op[0] = (narrow) ((wide) op[1] + (wide) op[2] + 1)) >> 1;
+;; -------------------------------------------------------------------------
+
+(define_expand "<u>avg<v_double_trunc>3_floor"
+ [(set (match_operand:<V_DOUBLE_TRUNC> 0 "register_operand")
+   (truncate:<V_DOUBLE_TRUNC>
+    (<ext_to_rshift>:VWEXTI
+     (plus:VWEXTI
+      (any_extend:VWEXTI
+       (match_operand:<V_DOUBLE_TRUNC> 1 "register_operand"))
+      (any_extend:VWEXTI
+       (match_operand:<V_DOUBLE_TRUNC> 2 "register_operand"))))))]
+  "TARGET_VECTOR"
+{
+  /* First emit a widening addition.  */
+  rtx tmp1 = gen_reg_rtx (<MODE>mode);
+  rtx ops1[] = {tmp1, operands[1], operands[2]};
+  insn_code icode = code_for_pred_dual_widen (PLUS, <CODE>, <MODE>mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops1);
+
+  /* Then a narrowing shift.  */
+  rtx ops2[] = {operands[0], tmp1, const1_rtx};
+  icode = code_for_pred_narrow_scalar (<EXT_TO_RSHIFT>, <MODE>mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops2);
+  DONE;
+})
+
+(define_expand "<u>avg<v_double_trunc>3_ceil"
+ [(set (match_operand:<V_DOUBLE_TRUNC> 0 "register_operand")
+   (truncate:<V_DOUBLE_TRUNC>
+    (<ext_to_rshift>:VWEXTI
+     (plus:VWEXTI
+      (plus:VWEXTI
+       (any_extend:VWEXTI
+ (match_operand:<V_DOUBLE_TRUNC> 1 "register_operand"))
+       (any_extend:VWEXTI
+ (match_operand:<V_DOUBLE_TRUNC> 2 "register_operand")))
+      (const_int 1)))))]
+  "TARGET_VECTOR"
+{
+  /* First emit a widening addition.  */
+  rtx tmp1 = gen_reg_rtx (<MODE>mode);
+  rtx ops1[] = {tmp1, operands[1], operands[2]};
+  insn_code icode = code_for_pred_dual_widen (PLUS, <CODE>, <MODE>mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops1);
+
+  /* Then add 1.  */
+  rtx tmp2 = gen_reg_rtx (<MODE>mode);
+  rtx ops2[] = {tmp2, tmp1, const1_rtx};
+  icode = code_for_pred_scalar (PLUS, <MODE>mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops2);
+
+  /* Finally, a narrowing shift.  */
+  rtx ops3[] = {operands[0], tmp2, const1_rtx};
+  icode = code_for_pred_narrow_scalar (<EXT_TO_RSHIFT>, <MODE>mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops3);
+  DONE;
+})
diff --git a/gcc/config/riscv/vector-iterators.md b/gcc/config/riscv/vector-iterators.md
index 37c6337f1a3..409f63332c9 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -1968,6 +1968,11 @@ (define_code_attr macc_msac [(plus "macc") (minus "msac")])
(define_code_attr nmsub_nmadd [(plus "nmsub") (minus "nmadd")])
(define_code_attr nmsac_nmacc [(plus "nmsac") (minus "nmacc")])
+(define_code_attr ext_to_rshift [(sign_extend "ashiftrt")
+                                 (zero_extend "lshiftrt")])
+(define_code_attr EXT_TO_RSHIFT [(sign_extend "ASHIFTRT")
+                                 (zero_extend "LSHIFTRT")])
+
(define_code_iterator and_ior [and ior])
(define_code_iterator any_float_binop [plus mult minus div])
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-run.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-run.c
new file mode 100644
index 00000000000..7ca193ec2f2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-run.c
@@ -0,0 +1,85 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "-std=c99 -fno-vect-cost-model --param=riscv-autovec-preference=scalable -lm" } */
+
+#include <limits.h>
+#include <math.h>
+#include <assert.h>
+
+#include "vec-avg-template.h"
+
+#define SZ 256
+
+#define RUNS1(TYPE, SCALE)                                                     \
+  TYPE a##TYPE[SZ + 1];                                                        \
+  TYPE b##TYPE[SZ + 1];                                                        \
+  TYPE dst##TYPE[SZ + 1];                                                      \
+  for (int cnt = 0, i = -(SZ * SCALE) / 2; i < (SZ * SCALE) / 2; i += SCALE)   \
+    {                                                                          \
+      a##TYPE[cnt] = i;                                                        \
+      b##TYPE[cnt] = i + 1;                                                    \
+      dst##TYPE[cnt++] = 0;                                                    \
+    }                                                                          \
+  vavg_##TYPE (dst##TYPE, a##TYPE, b##TYPE, SZ);                               \
+  for (int i = 0; i < SZ; i += SCALE)                                          \
+    assert (dst##TYPE[i] == floor ((a##TYPE[i] + b##TYPE[i]) / 2.0));
+
+#define RUNU1(TYPE, SCALE)                                                     \
+  TYPE a##TYPE[SZ + 1];                                                        \
+  TYPE b##TYPE[SZ + 1];                                                        \
+  TYPE dst##TYPE[SZ + 1];                                                      \
+  for (int cnt = 0, i = 0; i < (SZ * SCALE); i += SCALE)                       \
+    {                                                                          \
+      a##TYPE[cnt] = i;                                                        \
+      b##TYPE[cnt] = i + 1;                                                    \
+      dst##TYPE[cnt++] = 0;                                                    \
+    }                                                                          \
+  vavg_##TYPE (dst##TYPE, a##TYPE, b##TYPE, SZ);                               \
+  for (int i = 0; i < SZ; i += SCALE)                                          \
+    assert (dst##TYPE[i] == floor ((a##TYPE[i] + b##TYPE[i]) / 2.0));
+
+#define RUNS2(TYPE, SCALE)                                                     \
+  TYPE a2##TYPE[SZ + 1];                                                       \
+  TYPE b2##TYPE[SZ + 1];                                                       \
+  TYPE dst2##TYPE[SZ + 1];                                                     \
+  for (int cnt = 0, i = -(SZ * SCALE) / 2; i < (SZ * SCALE) / 2; i += SCALE)   \
+    {                                                                          \
+      a2##TYPE[cnt] = i;                                                       \
+      b2##TYPE[cnt] = i + 1;                                                   \
+      dst2##TYPE[cnt++] = 0;                                                   \
+    }                                                                          \
+  vavg2_##TYPE (dst2##TYPE, a2##TYPE, b2##TYPE, SZ);                           \
+  for (int i = 0; i < SZ; i += SCALE)                                          \
+    assert (dst2##TYPE[i] == ceil ((a2##TYPE[i] + b2##TYPE[i]) / 2.0));
+
+#define RUNU2(TYPE, SCALE)                                                     \
+  TYPE a2##TYPE[SZ + 1];                                                       \
+  TYPE b2##TYPE[SZ + 1];                                                       \
+  TYPE dst2##TYPE[SZ + 1];                                                     \
+  for (int cnt = 0, i = 0; i < (SZ * SCALE); i += SCALE)                       \
+    {                                                                          \
+      a2##TYPE[cnt] = i;                                                       \
+      b2##TYPE[cnt] = i + 1;                                                   \
+      dst2##TYPE[cnt++] = 0;                                                   \
+    }                                                                          \
+  vavg2_##TYPE (dst2##TYPE, a2##TYPE, b2##TYPE, SZ);                           \
+  for (int i = 0; i < SZ; i += SCALE)                                          \
+    assert (dst2##TYPE[i] == ceil ((a2##TYPE[i] + b2##TYPE[i]) / 2.0));
+
+#define RUN_ALL()                                                              \
+  RUNS1 (int8_t, 1)                                                            \
+  RUNS1 (int16_t, 256)                                                         \
+  RUNS1 (int32_t, 65536)                                                       \
+  RUNU1 (uint8_t, 1)                                                           \
+  RUNU1 (uint16_t, 256)                                                        \
+  RUNU1 (uint32_t, 65536)                                                      \
+  RUNS2 (int8_t, 1)                                                            \
+  RUNS2 (int16_t, 256)                                                         \
+  RUNS2 (int32_t, 65536)                                                       \
+  RUNU2 (uint8_t, 1)                                                           \
+  RUNU2 (uint16_t, 256)                                                        \
+  RUNU2 (uint32_t, 65536)\
+
+int main ()
+{
+  RUN_ALL ()
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-rv32gcv.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-rv32gcv.c
new file mode 100644
index 00000000000..e2754339d94
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-rv32gcv.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-std=c99 -fno-vect-cost-model -march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable" } */
+
+#include "vec-avg-template.h"
+
+/* { dg-final { scan-assembler-times {\tvwadd\.vv} 6 } } */
+/* { dg-final { scan-assembler-times {\tvwaddu\.vv} 6 } } */
+/* { dg-final { scan-assembler-times {\tvadd\.vi} 6 } } */
+/* { dg-final { scan-assembler-times {\tvnsrl.wi} 6 } } */
+/* { dg-final { scan-assembler-times {\tvnsra.wi} 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-rv64gcv.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-rv64gcv.c
new file mode 100644
index 00000000000..210c0dc5460
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-rv64gcv.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-std=c99 -fno-vect-cost-model -march=rv64gcv_zvfh -mabi=lp64d --param=riscv-autovec-preference=scalable" } */
+
+#include "vec-avg-template.h"
+
+/* { dg-final { scan-assembler-times {\tvwadd\.vv} 6 } } */
+/* { dg-final { scan-assembler-times {\tvwaddu\.vv} 6 } } */
+/* { dg-final { scan-assembler-times {\tvadd\.vi} 6 } } */
+/* { dg-final { scan-assembler-times {\tvnsrl\.wi} 6 } } */
+/* { dg-final { scan-assembler-times {\tvnsra\.wi} 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-template.h b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-template.h
new file mode 100644
index 00000000000..9c2a6f1b9cb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-template.h
@@ -0,0 +1,33 @@
+#include <stdint-gcc.h>
+
+#define TEST_TYPE(TYPE, TYPE2)                                                 \
+  __attribute__ ((noipa)) void vavg_##TYPE (TYPE *dst, TYPE *a, TYPE *b,       \
+     int n)                             \
+  {                                                                            \
+    for (int i = 0; i < n; i++)                                                \
+      dst[i] = ((TYPE2) a[i] + b[i]) >> 1;                                     \
+  }
+
+#define TEST_TYPE2(TYPE, TYPE2)                                                \
+  __attribute__ ((noipa)) void vavg2_##TYPE (TYPE *dst, TYPE *a, TYPE *b,      \
+      int n)                            \
+  {                                                                            \
+    for (int i = 0; i < n; i++)                                                \
+      dst[i] = ((TYPE2) a[i] + b[i] + 1) >> 1;                                 \
+  }
+
+#define TEST_ALL()                                                             \
+  TEST_TYPE (int8_t, int16_t)                                                  \
+  TEST_TYPE (uint8_t, uint16_t)                                                \
+  TEST_TYPE (int16_t, int32_t)                                                 \
+  TEST_TYPE (uint16_t, uint32_t)                                               \
+  TEST_TYPE (int32_t, int64_t)                                                 \
+  TEST_TYPE (uint32_t, uint64_t)                                               \
+  TEST_TYPE2 (int8_t, int16_t)                                                 \
+  TEST_TYPE2 (uint8_t, uint16_t)                                               \
+  TEST_TYPE2 (int16_t, int32_t)                                                \
+  TEST_TYPE2 (uint16_t, uint32_t)                                              \
+  TEST_TYPE2 (int32_t, int64_t)                                                \
+  TEST_TYPE2 (uint32_t, uint64_t)
+
+TEST_ALL()
-- 
2.41.0
  
Robin Dapp Aug. 2, 2023, 6:49 p.m. UTC | #2
> 1. How do you model round to +Inf (avg_floor) and round to -Inf (avg_ceil) ?

That's just specified by the +1 or the lack of it in the original pattern.
Actually the IFN is just a detour because we would create perfect code
if not for the fallback.  But as there is currently now way to check for
the existence of a narrowing shift we cannot circumvent the fallback.

> 2. Is it possible we could use vaadd[u] to model avg ?
In principle yes (I first read it wrong that overflow must not happen but the
specs actually say that it does not happen).
However, we would need to set a rounding mode before vaadd or check its current
value and provide a fallback.  Off the spot I can't imagine a workaround like
two vaadds or so.

Regards
 Robin
  
juzhe.zhong@rivai.ai Aug. 2, 2023, 9:36 p.m. UTC | #3
I just checked LLVM:
https://godbolt.org/z/nMa6qnEeT 

This patch generally is reasonable so LGTM.



juzhe.zhong@rivai.ai
 
From: Robin Dapp
Date: 2023-08-03 02:49
To: 钟居哲; gcc-patches; palmer; kito.cheng; Jeff Law
CC: rdapp.gcc
Subject: Re: [PATCH] RISC-V: Implement vector "average" autovec pattern.
> 1. How do you model round to +Inf (avg_floor) and round to -Inf (avg_ceil) ?
 
That's just specified by the +1 or the lack of it in the original pattern.
Actually the IFN is just a detour because we would create perfect code
if not for the fallback.  But as there is currently now way to check for
the existence of a narrowing shift we cannot circumvent the fallback.
 
> 2. Is it possible we could use vaadd[u] to model avg ?
In principle yes (I first read it wrong that overflow must not happen but the
specs actually say that it does not happen).
However, we would need to set a rounding mode before vaadd or check its current
value and provide a fallback.  Off the spot I can't imagine a workaround like
two vaadds or so.
 
Regards
Robin
  
juzhe.zhong@rivai.ai Aug. 2, 2023, 9:44 p.m. UTC | #4
Plz put your testcases into:

# widening operation only test on LMUL < 8
set AUTOVEC_TEST_OPTS [list \
  {-ftree-vectorize -O3 --param riscv-autovec-lmul=m1} \
  {-ftree-vectorize -O3 --param riscv-autovec-lmul=m2} \
  {-ftree-vectorize -O3 --param riscv-autovec-lmul=m4} \
  {-ftree-vectorize -O2 --param riscv-autovec-lmul=m1} \
  {-ftree-vectorize -O2 --param riscv-autovec-lmul=m2} \
  {-ftree-vectorize -O2 --param riscv-autovec-lmul=m4} ]
foreach op $AUTOVEC_TEST_OPTS {
  dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/widen/*.\[cS\]]] \
    "" "$op"
}

You could either simpilfy put them into "widen" directory or create a new directly.
Anyway, make sure you have fully tested it with LMUL = 1/2/4.



juzhe.zhong@rivai.ai
 
From: Robin Dapp
Date: 2023-08-03 02:49
To: 钟居哲; gcc-patches; palmer; kito.cheng; Jeff Law
CC: rdapp.gcc
Subject: Re: [PATCH] RISC-V: Implement vector "average" autovec pattern.
> 1. How do you model round to +Inf (avg_floor) and round to -Inf (avg_ceil) ?
 
That's just specified by the +1 or the lack of it in the original pattern.
Actually the IFN is just a detour because we would create perfect code
if not for the fallback.  But as there is currently now way to check for
the existence of a narrowing shift we cannot circumvent the fallback.
 
> 2. Is it possible we could use vaadd[u] to model avg ?
In principle yes (I first read it wrong that overflow must not happen but the
specs actually say that it does not happen).
However, we would need to set a rounding mode before vaadd or check its current
value and provide a fallback.  Off the spot I can't imagine a workaround like
two vaadds or so.
 
Regards
Robin
  
Robin Dapp Aug. 15, 2023, 11:36 a.m. UTC | #5
> Plz put your testcases into:
> 
> # widening operation only test on LMUL < 8
> set AUTOVEC_TEST_OPTS [list \
>   {-ftree-vectorize -O3 --param riscv-autovec-lmul=m1} \
>   {-ftree-vectorize -O3 --param riscv-autovec-lmul=m2} \
>   {-ftree-vectorize -O3 --param riscv-autovec-lmul=m4} \
>   {-ftree-vectorize -O2 --param riscv-autovec-lmul=m1} \
>   {-ftree-vectorize -O2 --param riscv-autovec-lmul=m2} \
>   {-ftree-vectorize -O2 --param riscv-autovec-lmul=m4} ]
> foreach op $AUTOVEC_TEST_OPTS {
>   dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/widen/*.\[cS\]]] \
>     "" "$op"
> }
> 
> You could either simpilfy put them into "widen" directory or create a new directly.
> Anyway, make sure you have fully tested it with LMUL = 1/2/4.

Ah, almost forgot this.  I moved the tests to the widen directory
and will push it after testing.

Regards
 Robin
  

Patch

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 7b784437c7e..23d3c2feaff 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1752,3 +1752,69 @@  (define_expand "mask_len_fold_left_plus_<mode>"
 				    riscv_vector::reduction_type::MASK_LEN_FOLD_LEFT);
   DONE;
 })
+
+;; -------------------------------------------------------------------------
+;; ---- [INT] Average.
+;; -------------------------------------------------------------------------
+;; Implements the following "average" patterns:
+;; floor:
+;;  op[0] = (narrow) ((wide) op[1] + (wide) op[2]) >> 1;
+;; ceil:
+;;  op[0] = (narrow) ((wide) op[1] + (wide) op[2] + 1)) >> 1;
+;; -------------------------------------------------------------------------
+
+(define_expand "<u>avg<v_double_trunc>3_floor"
+ [(set (match_operand:<V_DOUBLE_TRUNC> 0 "register_operand")
+   (truncate:<V_DOUBLE_TRUNC>
+    (<ext_to_rshift>:VWEXTI
+     (plus:VWEXTI
+      (any_extend:VWEXTI
+       (match_operand:<V_DOUBLE_TRUNC> 1 "register_operand"))
+      (any_extend:VWEXTI
+       (match_operand:<V_DOUBLE_TRUNC> 2 "register_operand"))))))]
+  "TARGET_VECTOR"
+{
+  /* First emit a widening addition.  */
+  rtx tmp1 = gen_reg_rtx (<MODE>mode);
+  rtx ops1[] = {tmp1, operands[1], operands[2]};
+  insn_code icode = code_for_pred_dual_widen (PLUS, <CODE>, <MODE>mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops1);
+
+  /* Then a narrowing shift.  */
+  rtx ops2[] = {operands[0], tmp1, const1_rtx};
+  icode = code_for_pred_narrow_scalar (<EXT_TO_RSHIFT>, <MODE>mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops2);
+  DONE;
+})
+
+(define_expand "<u>avg<v_double_trunc>3_ceil"
+ [(set (match_operand:<V_DOUBLE_TRUNC> 0 "register_operand")
+   (truncate:<V_DOUBLE_TRUNC>
+    (<ext_to_rshift>:VWEXTI
+     (plus:VWEXTI
+      (plus:VWEXTI
+       (any_extend:VWEXTI
+	(match_operand:<V_DOUBLE_TRUNC> 1 "register_operand"))
+       (any_extend:VWEXTI
+	(match_operand:<V_DOUBLE_TRUNC> 2 "register_operand")))
+      (const_int 1)))))]
+  "TARGET_VECTOR"
+{
+  /* First emit a widening addition.  */
+  rtx tmp1 = gen_reg_rtx (<MODE>mode);
+  rtx ops1[] = {tmp1, operands[1], operands[2]};
+  insn_code icode = code_for_pred_dual_widen (PLUS, <CODE>, <MODE>mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops1);
+
+  /* Then add 1.  */
+  rtx tmp2 = gen_reg_rtx (<MODE>mode);
+  rtx ops2[] = {tmp2, tmp1, const1_rtx};
+  icode = code_for_pred_scalar (PLUS, <MODE>mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops2);
+
+  /* Finally, a narrowing shift.  */
+  rtx ops3[] = {operands[0], tmp2, const1_rtx};
+  icode = code_for_pred_narrow_scalar (<EXT_TO_RSHIFT>, <MODE>mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops3);
+  DONE;
+})
diff --git a/gcc/config/riscv/vector-iterators.md b/gcc/config/riscv/vector-iterators.md
index 37c6337f1a3..409f63332c9 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -1968,6 +1968,11 @@  (define_code_attr macc_msac [(plus "macc") (minus "msac")])
 (define_code_attr nmsub_nmadd [(plus "nmsub") (minus "nmadd")])
 (define_code_attr nmsac_nmacc [(plus "nmsac") (minus "nmacc")])
 
+(define_code_attr ext_to_rshift [(sign_extend "ashiftrt")
+                                 (zero_extend "lshiftrt")])
+(define_code_attr EXT_TO_RSHIFT [(sign_extend "ASHIFTRT")
+                                 (zero_extend "LSHIFTRT")])
+
 (define_code_iterator and_ior [and ior])
 
 (define_code_iterator any_float_binop [plus mult minus div])
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-run.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-run.c
new file mode 100644
index 00000000000..7ca193ec2f2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-run.c
@@ -0,0 +1,85 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "-std=c99 -fno-vect-cost-model --param=riscv-autovec-preference=scalable -lm" } */
+
+#include <limits.h>
+#include <math.h>
+#include <assert.h>
+
+#include "vec-avg-template.h"
+
+#define SZ 256
+
+#define RUNS1(TYPE, SCALE)                                                     \
+  TYPE a##TYPE[SZ + 1];                                                        \
+  TYPE b##TYPE[SZ + 1];                                                        \
+  TYPE dst##TYPE[SZ + 1];                                                      \
+  for (int cnt = 0, i = -(SZ * SCALE) / 2; i < (SZ * SCALE) / 2; i += SCALE)   \
+    {                                                                          \
+      a##TYPE[cnt] = i;                                                        \
+      b##TYPE[cnt] = i + 1;                                                    \
+      dst##TYPE[cnt++] = 0;                                                    \
+    }                                                                          \
+  vavg_##TYPE (dst##TYPE, a##TYPE, b##TYPE, SZ);                               \
+  for (int i = 0; i < SZ; i += SCALE)                                          \
+    assert (dst##TYPE[i] == floor ((a##TYPE[i] + b##TYPE[i]) / 2.0));
+
+#define RUNU1(TYPE, SCALE)                                                     \
+  TYPE a##TYPE[SZ + 1];                                                        \
+  TYPE b##TYPE[SZ + 1];                                                        \
+  TYPE dst##TYPE[SZ + 1];                                                      \
+  for (int cnt = 0, i = 0; i < (SZ * SCALE); i += SCALE)                       \
+    {                                                                          \
+      a##TYPE[cnt] = i;                                                        \
+      b##TYPE[cnt] = i + 1;                                                    \
+      dst##TYPE[cnt++] = 0;                                                    \
+    }                                                                          \
+  vavg_##TYPE (dst##TYPE, a##TYPE, b##TYPE, SZ);                               \
+  for (int i = 0; i < SZ; i += SCALE)                                          \
+    assert (dst##TYPE[i] == floor ((a##TYPE[i] + b##TYPE[i]) / 2.0));
+
+#define RUNS2(TYPE, SCALE)                                                     \
+  TYPE a2##TYPE[SZ + 1];                                                       \
+  TYPE b2##TYPE[SZ + 1];                                                       \
+  TYPE dst2##TYPE[SZ + 1];                                                     \
+  for (int cnt = 0, i = -(SZ * SCALE) / 2; i < (SZ * SCALE) / 2; i += SCALE)   \
+    {                                                                          \
+      a2##TYPE[cnt] = i;                                                       \
+      b2##TYPE[cnt] = i + 1;                                                   \
+      dst2##TYPE[cnt++] = 0;                                                   \
+    }                                                                          \
+  vavg2_##TYPE (dst2##TYPE, a2##TYPE, b2##TYPE, SZ);                           \
+  for (int i = 0; i < SZ; i += SCALE)                                          \
+    assert (dst2##TYPE[i] == ceil ((a2##TYPE[i] + b2##TYPE[i]) / 2.0));
+
+#define RUNU2(TYPE, SCALE)                                                     \
+  TYPE a2##TYPE[SZ + 1];                                                       \
+  TYPE b2##TYPE[SZ + 1];                                                       \
+  TYPE dst2##TYPE[SZ + 1];                                                     \
+  for (int cnt = 0, i = 0; i < (SZ * SCALE); i += SCALE)                       \
+    {                                                                          \
+      a2##TYPE[cnt] = i;                                                       \
+      b2##TYPE[cnt] = i + 1;                                                   \
+      dst2##TYPE[cnt++] = 0;                                                   \
+    }                                                                          \
+  vavg2_##TYPE (dst2##TYPE, a2##TYPE, b2##TYPE, SZ);                           \
+  for (int i = 0; i < SZ; i += SCALE)                                          \
+    assert (dst2##TYPE[i] == ceil ((a2##TYPE[i] + b2##TYPE[i]) / 2.0));
+
+#define RUN_ALL()                                                              \
+  RUNS1 (int8_t, 1)                                                            \
+  RUNS1 (int16_t, 256)                                                         \
+  RUNS1 (int32_t, 65536)                                                       \
+  RUNU1 (uint8_t, 1)                                                           \
+  RUNU1 (uint16_t, 256)                                                        \
+  RUNU1 (uint32_t, 65536)                                                      \
+  RUNS2 (int8_t, 1)                                                            \
+  RUNS2 (int16_t, 256)                                                         \
+  RUNS2 (int32_t, 65536)                                                       \
+  RUNU2 (uint8_t, 1)                                                           \
+  RUNU2 (uint16_t, 256)                                                        \
+  RUNU2 (uint32_t, 65536)\
+
+int main ()
+{
+  RUN_ALL ()
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-rv32gcv.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-rv32gcv.c
new file mode 100644
index 00000000000..e2754339d94
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-rv32gcv.c
@@ -0,0 +1,10 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-std=c99 -fno-vect-cost-model -march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable" } */
+
+#include "vec-avg-template.h"
+
+/* { dg-final { scan-assembler-times {\tvwadd\.vv} 6 } } */
+/* { dg-final { scan-assembler-times {\tvwaddu\.vv} 6 } } */
+/* { dg-final { scan-assembler-times {\tvadd\.vi} 6 } } */
+/* { dg-final { scan-assembler-times {\tvnsrl.wi} 6 } } */
+/* { dg-final { scan-assembler-times {\tvnsra.wi} 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-rv64gcv.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-rv64gcv.c
new file mode 100644
index 00000000000..210c0dc5460
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-rv64gcv.c
@@ -0,0 +1,10 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-std=c99 -fno-vect-cost-model -march=rv64gcv_zvfh -mabi=lp64d --param=riscv-autovec-preference=scalable" } */
+
+#include "vec-avg-template.h"
+
+/* { dg-final { scan-assembler-times {\tvwadd\.vv} 6 } } */
+/* { dg-final { scan-assembler-times {\tvwaddu\.vv} 6 } } */
+/* { dg-final { scan-assembler-times {\tvadd\.vi} 6 } } */
+/* { dg-final { scan-assembler-times {\tvnsrl\.wi} 6 } } */
+/* { dg-final { scan-assembler-times {\tvnsra\.wi} 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-template.h b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-template.h
new file mode 100644
index 00000000000..9c2a6f1b9cb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-template.h
@@ -0,0 +1,33 @@ 
+#include <stdint-gcc.h>
+
+#define TEST_TYPE(TYPE, TYPE2)                                                 \
+  __attribute__ ((noipa)) void vavg_##TYPE (TYPE *dst, TYPE *a, TYPE *b,       \
+					    int n)                             \
+  {                                                                            \
+    for (int i = 0; i < n; i++)                                                \
+      dst[i] = ((TYPE2) a[i] + b[i]) >> 1;                                     \
+  }
+
+#define TEST_TYPE2(TYPE, TYPE2)                                                \
+  __attribute__ ((noipa)) void vavg2_##TYPE (TYPE *dst, TYPE *a, TYPE *b,      \
+					     int n)                            \
+  {                                                                            \
+    for (int i = 0; i < n; i++)                                                \
+      dst[i] = ((TYPE2) a[i] + b[i] + 1) >> 1;                                 \
+  }
+
+#define TEST_ALL()                                                             \
+  TEST_TYPE (int8_t, int16_t)                                                  \
+  TEST_TYPE (uint8_t, uint16_t)                                                \
+  TEST_TYPE (int16_t, int32_t)                                                 \
+  TEST_TYPE (uint16_t, uint32_t)                                               \
+  TEST_TYPE (int32_t, int64_t)                                                 \
+  TEST_TYPE (uint32_t, uint64_t)                                               \
+  TEST_TYPE2 (int8_t, int16_t)                                                 \
+  TEST_TYPE2 (uint8_t, uint16_t)                                               \
+  TEST_TYPE2 (int16_t, int32_t)                                                \
+  TEST_TYPE2 (uint16_t, uint32_t)                                              \
+  TEST_TYPE2 (int32_t, int64_t)                                                \
+  TEST_TYPE2 (uint32_t, uint64_t)
+
+TEST_ALL()