[v4,04/10] RISC-V:autovec: Add target vectorization hooks
Checks
Commit Message
2023-03-02 Michael Collison <collison@rivosinc.com>
Juzhe Zhong <juzhe.zhong@rivai.ai>
* config/riscv/riscv.cc (riscv_option_override):
Set riscv_vectorization_factor.
(riscv_estimated_poly_value): Implement
TARGET_ESTIMATED_POLY_VALUE.
(riscv_preferred_simd_mode): Implement
TARGET_VECTORIZE_PREFERRED_SIMD_MODE.
(riscv_autovectorize_vector_modes): Implement
TARGET_AUTOVECTORIZE_VECTOR_MODES.
(riscv_get_mask_mode): Implement TARGET_VECTORIZE_GET_MASK_MODE.
(riscv_empty_mask_is_expensive): Implement
TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE.
(riscv_vectorize_create_costs): Implement
TARGET_VECTORIZE_CREATE_COSTS.
(TARGET_ESTIMATED_POLY_VALUE): Register target macro.
(TARGET_VECTORIZE_PREFERRED_SIMD_MODE): Ditto.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Ditto.
(TARGET_VECTORIZE_GET_MASK_MODE): Ditto.
(TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE): Ditto.
(TARGET_VECTORIZE_LOOP_LEN_OVERRIDE_MASK): Ditto.
---
gcc/config/riscv/riscv.cc | 156 ++++++++++++++++++++++++++++++++++++++
1 file changed, 156 insertions(+)
Comments
> +/* Implement TARGET_ESTIMATED_POLY_VALUE.
> + Look into the tuning structure for an estimate.
> + KIND specifies the type of requested estimate: min, max or likely.
> + For cores with a known RVV width all three estimates are the same.
> + For generic RVV tuning we want to distinguish the maximum estimate from
> + the minimum and likely ones.
> + The likely estimate is the same as the minimum in that case to give a
> + conservative behavior of auto-vectorizing with RVV when it is a win
> + even for 128-bit RVV.
> + When RVV width information is available VAL.coeffs[1] is multiplied by
> + the number of VQ chunks over the initial Advanced SIMD 128 bits. */
> +
> +static HOST_WIDE_INT
> +riscv_estimated_poly_value (poly_int64 val,
> + poly_value_estimate_kind kind = POLY_VALUE_LIKELY)
> +{
> + unsigned int width_source = BITS_PER_RISCV_VECTOR.is_constant ()
> + ? (unsigned int) BITS_PER_RISCV_VECTOR.to_constant ()
> + : (unsigned int) RVV_SCALABLE;
It could be RVV_SCALABLE only for now, so I would prefer to just
keep that switch only for now.
And adding assert (!BITS_PER_RISCV_VECTOR.is_constant ());
> +
> + /* If there is no core-specific information then the minimum and likely
> + values are based on 128-bit vectors and the maximum is based on
> + the architectural maximum of 2048 bits. */
Maximum is 65,536 bit per vector spec.
> + if (width_source == RVV_SCALABLE)
> + switch (kind)
> + {
> + case POLY_VALUE_MIN:
> + case POLY_VALUE_LIKELY:
> + return val.coeffs[0];
> +
> + case POLY_VALUE_MAX:
> + return val.coeffs[0] + val.coeffs[1] * 15;
> + }
> +
> + /* Allow BITS_PER_RISCV_VECTOR to be a bitmask of different VL, treating the
> + lowest as likely. This could be made more general if future -mtune
> + options need it to be. */
> + if (kind == POLY_VALUE_MAX)
> + width_source = 1 << floor_log2 (width_source);
> + else
> + width_source = least_bit_hwi (width_source);
> +
> + /* If the core provides width information, use that. */
> + HOST_WIDE_INT over_128 = width_source - 128;
> + return val.coeffs[0] + val.coeffs[1] * over_128 / 128;
> +}
> +
> +/* Implement TARGET_VECTORIZE_PREFERRED_SIMD_MODE. */
> +
> +static machine_mode
> +riscv_preferred_simd_mode (scalar_mode mode)
> +{
> + machine_mode vmode =
> + riscv_vector::riscv_vector_preferred_simd_mode (mode,
> + riscv_vectorization_factor);
> + if (VECTOR_MODE_P (vmode))
> + return vmode;
> +
> + return word_mode;
> +}
> +
> +/* Implement TARGET_AUTOVECTORIZE_VECTOR_MODES for RVV. */
> +static unsigned int
> +riscv_autovectorize_vector_modes (vector_modes *modes, bool)
> +{
> + if (!TARGET_VECTOR)
> + return 0;
> +
> + if (riscv_vectorization_factor == RVV_LMUL1)
> + {
> + modes->safe_push (VNx16QImode);
> + modes->safe_push (VNx8QImode);
> + modes->safe_push (VNx4QImode);
> + modes->safe_push (VNx2QImode);
> + }
Keep LMUL1 case only for this moment.
Hi, Michael. Thanks for extracting patches from "rvv-next". I have several comments here:
1. I think it's not appropriate and useless to support such many target hook in the first auto-vec support patch.
You should only support TARGET_VECTORIZE_PREFERRED_SIMD_MODE is enough, supporting too many
useless target hook will make patch too messy and not easy to trace.
2. TARGET_ESTIMATED_POLY_VALUE since it's currently not used.
3. TARGET_AUTOVECTORIZE_VECTOR_MODES it's not used in the first patch.
4. TARGET_VECTORIZE_GET_MASK_MODE && TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE is used to
specify the mask mode for WHILE_ULT and comparison result.
These 2 target hook are not used when you don't implement WHILE_ULT/VCOND/VEC_CMP/.... pattern.
5. TARGET_VECTORIZE_LOOP_LEN_OVERRIDE_MASK is the target hook I added in rvv-next, it's not existed in the upstream GCC.
You should not add it when I didn't support it yet in upstream GCC.
....etc.
So, the basic idea is that you should only TARGET_VECTORIZE_PREFERRED_SIMD_MODE in the first enabling basic auto-vectorization patch.
It should be enough when we only implement simple len_load/len_store.
I have sent the patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616223.html to initial basic auto-vectorization.
juzhe.zhong@rivai.ai
From: Michael Collison
Date: 2023-04-18 02:36
To: gcc-patches
Subject: [PATCH v4 04/10] RISC-V:autovec: Add target vectorization hooks
2023-03-02 Michael Collison <collison@rivosinc.com>
Juzhe Zhong <juzhe.zhong@rivai.ai>
* config/riscv/riscv.cc (riscv_option_override):
Set riscv_vectorization_factor.
(riscv_estimated_poly_value): Implement
TARGET_ESTIMATED_POLY_VALUE.
(riscv_preferred_simd_mode): Implement
TARGET_VECTORIZE_PREFERRED_SIMD_MODE.
(riscv_autovectorize_vector_modes): Implement
TARGET_AUTOVECTORIZE_VECTOR_MODES.
(riscv_get_mask_mode): Implement TARGET_VECTORIZE_GET_MASK_MODE.
(riscv_empty_mask_is_expensive): Implement
TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE.
(riscv_vectorize_create_costs): Implement
TARGET_VECTORIZE_CREATE_COSTS.
(TARGET_ESTIMATED_POLY_VALUE): Register target macro.
(TARGET_VECTORIZE_PREFERRED_SIMD_MODE): Ditto.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Ditto.
(TARGET_VECTORIZE_GET_MASK_MODE): Ditto.
(TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE): Ditto.
(TARGET_VECTORIZE_LOOP_LEN_OVERRIDE_MASK): Ditto.
---
gcc/config/riscv/riscv.cc | 156 ++++++++++++++++++++++++++++++++++++++
1 file changed, 156 insertions(+)
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index dc47434fac4..9af06d926cf 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -60,6 +60,15 @@ along with GCC; see the file COPYING3. If not see
#include "opts.h"
#include "tm-constrs.h"
#include "rtl-iter.h"
+#include "gimple.h"
+#include "cfghooks.h"
+#include "cfgloop.h"
+#include "cfgrtl.h"
+#include "sel-sched.h"
+#include "fold-const.h"
+#include "gimple-iterator.h"
+#include "gimple-expr.h"
+#include "tree-vectorizer.h"
/* This file should be included last. */
#include "target-def.h"
@@ -275,6 +284,9 @@ poly_uint16 riscv_vector_chunks;
/* The number of bytes in a vector chunk. */
unsigned riscv_bytes_per_vector_chunk;
+/* Prefer vf for auto-vectorizer. */
+unsigned riscv_vectorization_factor;
+
/* Index R is the smallest register class that contains register R. */
const enum reg_class riscv_regno_to_class[FIRST_PSEUDO_REGISTER] = {
GR_REGS, GR_REGS, GR_REGS, GR_REGS,
@@ -6363,6 +6375,10 @@ riscv_option_override (void)
/* Convert -march to a chunks count. */
riscv_vector_chunks = riscv_convert_vector_bits ();
+
+ if (TARGET_VECTOR)
+ riscv_vectorization_factor = riscv_vector_lmul;
+
}
/* Implement TARGET_CONDITIONAL_REGISTER_USAGE. */
@@ -7057,6 +7073,128 @@ riscv_dwarf_poly_indeterminate_value (unsigned int i, unsigned int *factor,
return RISCV_DWARF_VLENB;
}
+/* Implement TARGET_ESTIMATED_POLY_VALUE.
+ Look into the tuning structure for an estimate.
+ KIND specifies the type of requested estimate: min, max or likely.
+ For cores with a known RVV width all three estimates are the same.
+ For generic RVV tuning we want to distinguish the maximum estimate from
+ the minimum and likely ones.
+ The likely estimate is the same as the minimum in that case to give a
+ conservative behavior of auto-vectorizing with RVV when it is a win
+ even for 128-bit RVV.
+ When RVV width information is available VAL.coeffs[1] is multiplied by
+ the number of VQ chunks over the initial Advanced SIMD 128 bits. */
+
+static HOST_WIDE_INT
+riscv_estimated_poly_value (poly_int64 val,
+ poly_value_estimate_kind kind = POLY_VALUE_LIKELY)
+{
+ unsigned int width_source = BITS_PER_RISCV_VECTOR.is_constant ()
+ ? (unsigned int) BITS_PER_RISCV_VECTOR.to_constant ()
+ : (unsigned int) RVV_SCALABLE;
+
+ /* If there is no core-specific information then the minimum and likely
+ values are based on 128-bit vectors and the maximum is based on
+ the architectural maximum of 2048 bits. */
+ if (width_source == RVV_SCALABLE)
+ switch (kind)
+ {
+ case POLY_VALUE_MIN:
+ case POLY_VALUE_LIKELY:
+ return val.coeffs[0];
+
+ case POLY_VALUE_MAX:
+ return val.coeffs[0] + val.coeffs[1] * 15;
+ }
+
+ /* Allow BITS_PER_RISCV_VECTOR to be a bitmask of different VL, treating the
+ lowest as likely. This could be made more general if future -mtune
+ options need it to be. */
+ if (kind == POLY_VALUE_MAX)
+ width_source = 1 << floor_log2 (width_source);
+ else
+ width_source = least_bit_hwi (width_source);
+
+ /* If the core provides width information, use that. */
+ HOST_WIDE_INT over_128 = width_source - 128;
+ return val.coeffs[0] + val.coeffs[1] * over_128 / 128;
+}
+
+/* Implement TARGET_VECTORIZE_PREFERRED_SIMD_MODE. */
+
+static machine_mode
+riscv_preferred_simd_mode (scalar_mode mode)
+{
+ machine_mode vmode =
+ riscv_vector::riscv_vector_preferred_simd_mode (mode,
+ riscv_vectorization_factor);
+ if (VECTOR_MODE_P (vmode))
+ return vmode;
+
+ return word_mode;
+}
+
+/* Implement TARGET_AUTOVECTORIZE_VECTOR_MODES for RVV. */
+static unsigned int
+riscv_autovectorize_vector_modes (vector_modes *modes, bool)
+{
+ if (!TARGET_VECTOR)
+ return 0;
+
+ if (riscv_vectorization_factor == RVV_LMUL1)
+ {
+ modes->safe_push (VNx16QImode);
+ modes->safe_push (VNx8QImode);
+ modes->safe_push (VNx4QImode);
+ modes->safe_push (VNx2QImode);
+ }
+ else if (riscv_vectorization_factor == RVV_LMUL2)
+ {
+ modes->safe_push (VNx32QImode);
+ modes->safe_push (VNx16QImode);
+ modes->safe_push (VNx8QImode);
+ modes->safe_push (VNx4QImode);
+ }
+ else if (riscv_vectorization_factor == RVV_LMUL4)
+ {
+ modes->safe_push (VNx64QImode);
+ modes->safe_push (VNx32QImode);
+ modes->safe_push (VNx16QImode);
+ modes->safe_push (VNx8QImode);
+ }
+ else
+ {
+ modes->safe_push (VNx64QImode);
+ modes->safe_push (VNx32QImode);
+ modes->safe_push (VNx16QImode);
+ }
+
+ return 0;
+}
+
+/* Implement TARGET_VECTORIZE_GET_MASK_MODE. */
+
+static opt_machine_mode
+riscv_get_mask_mode (machine_mode mode)
+{
+ machine_mode mask_mode = VOIDmode;
+ if (TARGET_VECTOR
+ && riscv_vector::riscv_vector_get_mask_mode (mode).exists (&mask_mode))
+ return mask_mode;
+
+ return default_get_mask_mode (mode);
+}
+
+/* Implement TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE. Assume for now that
+ it isn't worth branching around empty masked ops (including masked
+ stores). */
+
+static bool
+riscv_empty_mask_is_expensive (unsigned)
+{
+ return false;
+}
+
/* Return true if a shift-amount matches the trailing cleared bits on
a bitmask. */
@@ -7382,6 +7520,24 @@ riscv_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
#undef TARGET_VERIFY_TYPE_CONTEXT
#define TARGET_VERIFY_TYPE_CONTEXT riscv_verify_type_context
+#undef TARGET_ESTIMATED_POLY_VALUE
+#define TARGET_ESTIMATED_POLY_VALUE riscv_estimated_poly_value
+
+#undef TARGET_VECTORIZE_PREFERRED_SIMD_MODE
+#define TARGET_VECTORIZE_PREFERRED_SIMD_MODE riscv_preferred_simd_mode
+
+#undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES
+#define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES riscv_autovectorize_vector_modes
+
+#undef TARGET_VECTORIZE_GET_MASK_MODE
+#define TARGET_VECTORIZE_GET_MASK_MODE riscv_get_mask_mode
+
+#undef TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE
+#define TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE riscv_empty_mask_is_expensive
+
+#undef TARGET_VECTORIZE_LOOP_LEN_OVERRIDE_MASK
+#define TARGET_VECTORIZE_LOOP_LEN_OVERRIDE_MASK riscv_loop_len_override_mask
+
#undef TARGET_VECTOR_ALIGNMENT
#define TARGET_VECTOR_ALIGNMENT riscv_vector_alignment
--
2.34.1
@@ -60,6 +60,15 @@ along with GCC; see the file COPYING3. If not see
#include "opts.h"
#include "tm-constrs.h"
#include "rtl-iter.h"
+#include "gimple.h"
+#include "cfghooks.h"
+#include "cfgloop.h"
+#include "cfgrtl.h"
+#include "sel-sched.h"
+#include "fold-const.h"
+#include "gimple-iterator.h"
+#include "gimple-expr.h"
+#include "tree-vectorizer.h"
/* This file should be included last. */
#include "target-def.h"
@@ -275,6 +284,9 @@ poly_uint16 riscv_vector_chunks;
/* The number of bytes in a vector chunk. */
unsigned riscv_bytes_per_vector_chunk;
+/* Prefer vf for auto-vectorizer. */
+unsigned riscv_vectorization_factor;
+
/* Index R is the smallest register class that contains register R. */
const enum reg_class riscv_regno_to_class[FIRST_PSEUDO_REGISTER] = {
GR_REGS, GR_REGS, GR_REGS, GR_REGS,
@@ -6363,6 +6375,10 @@ riscv_option_override (void)
/* Convert -march to a chunks count. */
riscv_vector_chunks = riscv_convert_vector_bits ();
+
+ if (TARGET_VECTOR)
+ riscv_vectorization_factor = riscv_vector_lmul;
+
}
/* Implement TARGET_CONDITIONAL_REGISTER_USAGE. */
@@ -7057,6 +7073,128 @@ riscv_dwarf_poly_indeterminate_value (unsigned int i, unsigned int *factor,
return RISCV_DWARF_VLENB;
}
+/* Implement TARGET_ESTIMATED_POLY_VALUE.
+ Look into the tuning structure for an estimate.
+ KIND specifies the type of requested estimate: min, max or likely.
+ For cores with a known RVV width all three estimates are the same.
+ For generic RVV tuning we want to distinguish the maximum estimate from
+ the minimum and likely ones.
+ The likely estimate is the same as the minimum in that case to give a
+ conservative behavior of auto-vectorizing with RVV when it is a win
+ even for 128-bit RVV.
+ When RVV width information is available VAL.coeffs[1] is multiplied by
+ the number of VQ chunks over the initial Advanced SIMD 128 bits. */
+
+static HOST_WIDE_INT
+riscv_estimated_poly_value (poly_int64 val,
+ poly_value_estimate_kind kind = POLY_VALUE_LIKELY)
+{
+ unsigned int width_source = BITS_PER_RISCV_VECTOR.is_constant ()
+ ? (unsigned int) BITS_PER_RISCV_VECTOR.to_constant ()
+ : (unsigned int) RVV_SCALABLE;
+
+ /* If there is no core-specific information then the minimum and likely
+ values are based on 128-bit vectors and the maximum is based on
+ the architectural maximum of 2048 bits. */
+ if (width_source == RVV_SCALABLE)
+ switch (kind)
+ {
+ case POLY_VALUE_MIN:
+ case POLY_VALUE_LIKELY:
+ return val.coeffs[0];
+
+ case POLY_VALUE_MAX:
+ return val.coeffs[0] + val.coeffs[1] * 15;
+ }
+
+ /* Allow BITS_PER_RISCV_VECTOR to be a bitmask of different VL, treating the
+ lowest as likely. This could be made more general if future -mtune
+ options need it to be. */
+ if (kind == POLY_VALUE_MAX)
+ width_source = 1 << floor_log2 (width_source);
+ else
+ width_source = least_bit_hwi (width_source);
+
+ /* If the core provides width information, use that. */
+ HOST_WIDE_INT over_128 = width_source - 128;
+ return val.coeffs[0] + val.coeffs[1] * over_128 / 128;
+}
+
+/* Implement TARGET_VECTORIZE_PREFERRED_SIMD_MODE. */
+
+static machine_mode
+riscv_preferred_simd_mode (scalar_mode mode)
+{
+ machine_mode vmode =
+ riscv_vector::riscv_vector_preferred_simd_mode (mode,
+ riscv_vectorization_factor);
+ if (VECTOR_MODE_P (vmode))
+ return vmode;
+
+ return word_mode;
+}
+
+/* Implement TARGET_AUTOVECTORIZE_VECTOR_MODES for RVV. */
+static unsigned int
+riscv_autovectorize_vector_modes (vector_modes *modes, bool)
+{
+ if (!TARGET_VECTOR)
+ return 0;
+
+ if (riscv_vectorization_factor == RVV_LMUL1)
+ {
+ modes->safe_push (VNx16QImode);
+ modes->safe_push (VNx8QImode);
+ modes->safe_push (VNx4QImode);
+ modes->safe_push (VNx2QImode);
+ }
+ else if (riscv_vectorization_factor == RVV_LMUL2)
+ {
+ modes->safe_push (VNx32QImode);
+ modes->safe_push (VNx16QImode);
+ modes->safe_push (VNx8QImode);
+ modes->safe_push (VNx4QImode);
+ }
+ else if (riscv_vectorization_factor == RVV_LMUL4)
+ {
+ modes->safe_push (VNx64QImode);
+ modes->safe_push (VNx32QImode);
+ modes->safe_push (VNx16QImode);
+ modes->safe_push (VNx8QImode);
+ }
+ else
+ {
+ modes->safe_push (VNx64QImode);
+ modes->safe_push (VNx32QImode);
+ modes->safe_push (VNx16QImode);
+ }
+
+ return 0;
+}
+
+/* Implement TARGET_VECTORIZE_GET_MASK_MODE. */
+
+static opt_machine_mode
+riscv_get_mask_mode (machine_mode mode)
+{
+ machine_mode mask_mode = VOIDmode;
+ if (TARGET_VECTOR
+ && riscv_vector::riscv_vector_get_mask_mode (mode).exists (&mask_mode))
+ return mask_mode;
+
+ return default_get_mask_mode (mode);
+}
+
+/* Implement TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE. Assume for now that
+ it isn't worth branching around empty masked ops (including masked
+ stores). */
+
+static bool
+riscv_empty_mask_is_expensive (unsigned)
+{
+ return false;
+}
+
/* Return true if a shift-amount matches the trailing cleared bits on
a bitmask. */
@@ -7382,6 +7520,24 @@ riscv_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
#undef TARGET_VERIFY_TYPE_CONTEXT
#define TARGET_VERIFY_TYPE_CONTEXT riscv_verify_type_context
+#undef TARGET_ESTIMATED_POLY_VALUE
+#define TARGET_ESTIMATED_POLY_VALUE riscv_estimated_poly_value
+
+#undef TARGET_VECTORIZE_PREFERRED_SIMD_MODE
+#define TARGET_VECTORIZE_PREFERRED_SIMD_MODE riscv_preferred_simd_mode
+
+#undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES
+#define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES riscv_autovectorize_vector_modes
+
+#undef TARGET_VECTORIZE_GET_MASK_MODE
+#define TARGET_VECTORIZE_GET_MASK_MODE riscv_get_mask_mode
+
+#undef TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE
+#define TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE riscv_empty_mask_is_expensive
+
+#undef TARGET_VECTORIZE_LOOP_LEN_OVERRIDE_MASK
+#define TARGET_VECTORIZE_LOOP_LEN_OVERRIDE_MASK riscv_loop_len_override_mask
+
#undef TARGET_VECTOR_ALIGNMENT
#define TARGET_VECTOR_ALIGNMENT riscv_vector_alignment