[V2] Machine Description: Add LEN_MASK_{GATHER_LOAD, SCATTER_STORE} pattern

Message ID 20230630023618.3898001-1-juzhe.zhong@rivai.ai
State Unresolved
Headers
Series [V2] Machine Description: Add LEN_MASK_{GATHER_LOAD, SCATTER_STORE} pattern |

Checks

Context Check Description
snail/gcc-patch-check warning Git am fail log

Commit Message

juzhe.zhong@rivai.ai June 30, 2023, 2:36 a.m. UTC
  From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>

Hi, Richi and Richard.

This patch is adding LEN_MASK_{GATHER_LOAD,SCATTER_STORE} to allow targets
handle flow control by mask and loop control by length on gather/scatter memory
operations. Consider this following case:

#include <stdint.h>
void
f (uint8_t *restrict a, 
   uint8_t *restrict b, int n,
   int base, int step,
   int *restrict cond)
{
  for (int i = 0; i < n; ++i)
    {
      if (cond[i])
        a[i * step + base] = b[i * step + base];
    }
}

We hope RVV can vectorize such case into following IR:

loop_len = SELECT_VL
control_mask = comparison
v = LEN_MASK_GATHER_LOAD (.., loop_len, control_mask)
LEN_SCATTER_STORE (... v, ..., loop_len, control_mask)

This patch doesn't apply such patterns into vectorizer, just add patterns
and update the documents.

Will send patch which apply such patterns into vectorizer soon after this
patch is approved.

Thanks.

gcc/ChangeLog:

        * doc/md.texi: Add LEN_MASK_{GATHER_LOAD,SCATTER_STORE}.
        * internal-fn.cc (expand_scatter_store_optab_fn): Ditto.
        (expand_gather_load_optab_fn): Ditto.
        (internal_load_fn_p): Ditto.
        (internal_store_fn_p): Ditto.
        (internal_gather_scatter_fn_p): Ditto.
        (internal_fn_mask_index): Ditto.
        (internal_fn_stored_value_index): Ditto.
        * internal-fn.def (LEN_MASK_GATHER_LOAD): Ditto.
        (LEN_MASK_SCATTER_STORE): Ditto.
        * optabs.def (OPTAB_CD): Ditto.

---
 gcc/doc/md.texi     | 17 +++++++++++++++++
 gcc/internal-fn.cc  | 32 ++++++++++++++++++++++++++++++--
 gcc/internal-fn.def |  8 ++++++--
 gcc/optabs.def      |  2 ++
 4 files changed, 55 insertions(+), 4 deletions(-)
  

Comments

Richard Biener June 30, 2023, 6:26 a.m. UTC | #1
On Fri, 30 Jun 2023, juzhe.zhong@rivai.ai wrote:

> From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
> 
> Hi, Richi and Richard.
> 
> This patch is adding LEN_MASK_{GATHER_LOAD,SCATTER_STORE} to allow targets
> handle flow control by mask and loop control by length on gather/scatter memory
> operations. Consider this following case:
> 
> #include <stdint.h>
> void
> f (uint8_t *restrict a, 
>    uint8_t *restrict b, int n,
>    int base, int step,
>    int *restrict cond)
> {
>   for (int i = 0; i < n; ++i)
>     {
>       if (cond[i])
>         a[i * step + base] = b[i * step + base];
>     }
> }
> 
> We hope RVV can vectorize such case into following IR:
> 
> loop_len = SELECT_VL
> control_mask = comparison
> v = LEN_MASK_GATHER_LOAD (.., loop_len, control_mask)
> LEN_SCATTER_STORE (... v, ..., loop_len, control_mask)
> 
> This patch doesn't apply such patterns into vectorizer, just add patterns
> and update the documents.

I see this doesn't add a BIAS argument - I think we should be consistent
here, at least for the memory access internal functions.  I'll note
that 'len' has issues with scatter/gather anyway since the trick of
handling 'len' in bytes by only providing QImode variants doesn't work
here.  So maybe that's good enough for an argument to not add bias
either ...

Richard, do you have any opinion here?

> Will send patch which apply such patterns into vectorizer soon after this
> patch is approved.
> 
> Thanks.
> 
> gcc/ChangeLog:
> 
>         * doc/md.texi: Add LEN_MASK_{GATHER_LOAD,SCATTER_STORE}.
>         * internal-fn.cc (expand_scatter_store_optab_fn): Ditto.
>         (expand_gather_load_optab_fn): Ditto.
>         (internal_load_fn_p): Ditto.
>         (internal_store_fn_p): Ditto.
>         (internal_gather_scatter_fn_p): Ditto.
>         (internal_fn_mask_index): Ditto.

Can you add internal_fn_len_index please and make use of it?
Should have asked for this in the previous patch already I guess.

Otherwise this looks good to me.

Thanks,
Richard.

>         (internal_fn_stored_value_index): Ditto.
>         * internal-fn.def (LEN_MASK_GATHER_LOAD): Ditto.
>         (LEN_MASK_SCATTER_STORE): Ditto.
>         * optabs.def (OPTAB_CD): Ditto.
> 
> ---
>  gcc/doc/md.texi     | 17 +++++++++++++++++
>  gcc/internal-fn.cc  | 32 ++++++++++++++++++++++++++++++--
>  gcc/internal-fn.def |  8 ++++++--
>  gcc/optabs.def      |  2 ++
>  4 files changed, 55 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 9648fdc846a..b84aaab7075 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5040,6 +5040,15 @@ operand 5.  Bit @var{i} of the mask is set if element @var{i}
>  of the result should be loaded from memory and clear if element @var{i}
>  of the result should be set to zero.
>  
> +@cindex @code{len_mask_gather_load@var{m}@var{n}} instruction pattern
> +@item @samp{len_mask_gather_load@var{m}@var{n}}
> +Like @samp{gather_load@var{m}@var{n}}, but takes an extra length operand (operand 5)
> +as well as a mask operand (operand 6). Similar to len_maskload, the instruction loads
> +at most (operand 5) elements from memory.
> +Bit @var{i} of the mask is set if element @var{i} of the result should
> +be loaded from memory and clear if element @var{i} of the result should be undefined.
> +Mask elements @var{i} with i > (operand 5) are ignored.
> +
>  @cindex @code{scatter_store@var{m}@var{n}} instruction pattern
>  @item @samp{scatter_store@var{m}@var{n}}
>  Store a vector of mode @var{m} into several distinct memory locations.
> @@ -5069,6 +5078,14 @@ Like @samp{scatter_store@var{m}@var{n}}, but takes an extra mask operand as
>  operand 5.  Bit @var{i} of the mask is set if element @var{i}
>  of the result should be stored to memory.
>  
> +@cindex @code{len_mask_scatter_store@var{m}@var{n}} instruction pattern
> +@item @samp{len_mask_scatter_store@var{m}@var{n}}
> +Like @samp{scatter_store@var{m}@var{n}}, but takes an extra length operand (operand 5)
> +as well as a mask operand (operand 6). The instruction stores at most (operand 5) elements
> +of (operand 4) to memory.
> +Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be stored.
> +Mask elements @var{i} with i > (operand 5) are ignored.
> +
>  @cindex @code{vec_set@var{m}} instruction pattern
>  @item @samp{vec_set@var{m}}
>  Set given field in the vector value.  Operand 0 is the vector to modify,
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 9017176dc7a..e4b558e33d8 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -3537,7 +3537,7 @@ expand_scatter_store_optab_fn (internal_fn, gcall *stmt, direct_optab optab)
>    HOST_WIDE_INT scale_int = tree_to_shwi (scale);
>    rtx rhs_rtx = expand_normal (rhs);
>  
> -  class expand_operand ops[6];
> +  class expand_operand ops[7];
>    int i = 0;
>    create_address_operand (&ops[i++], base_rtx);
>    create_input_operand (&ops[i++], offset_rtx, TYPE_MODE (TREE_TYPE (offset)));
> @@ -3546,6 +3546,14 @@ expand_scatter_store_optab_fn (internal_fn, gcall *stmt, direct_optab optab)
>    create_input_operand (&ops[i++], rhs_rtx, TYPE_MODE (TREE_TYPE (rhs)));
>    if (mask_index >= 0)
>      {
> +      if (optab == len_mask_scatter_store_optab)
> +	{
> +	  tree len = gimple_call_arg (stmt, mask_index - 1);
> +	  rtx len_rtx = expand_normal (len);
> +	  create_convert_operand_from (&ops[i++], len_rtx,
> +				       TYPE_MODE (TREE_TYPE (len)),
> +				       TYPE_UNSIGNED (TREE_TYPE (len)));
> +	}
>        tree mask = gimple_call_arg (stmt, mask_index);
>        rtx mask_rtx = expand_normal (mask);
>        create_input_operand (&ops[i++], mask_rtx, TYPE_MODE (TREE_TYPE (mask)));
> @@ -3572,7 +3580,7 @@ expand_gather_load_optab_fn (internal_fn, gcall *stmt, direct_optab optab)
>    HOST_WIDE_INT scale_int = tree_to_shwi (scale);
>  
>    int i = 0;
> -  class expand_operand ops[6];
> +  class expand_operand ops[7];
>    create_output_operand (&ops[i++], lhs_rtx, TYPE_MODE (TREE_TYPE (lhs)));
>    create_address_operand (&ops[i++], base_rtx);
>    create_input_operand (&ops[i++], offset_rtx, TYPE_MODE (TREE_TYPE (offset)));
> @@ -3584,6 +3592,17 @@ expand_gather_load_optab_fn (internal_fn, gcall *stmt, direct_optab optab)
>        rtx mask_rtx = expand_normal (mask);
>        create_input_operand (&ops[i++], mask_rtx, TYPE_MODE (TREE_TYPE (mask)));
>      }
> +  else if (optab == len_mask_gather_load_optab)
> +    {
> +      tree len = gimple_call_arg (stmt, 4);
> +      rtx len_rtx = expand_normal (len);
> +      create_convert_operand_from (&ops[i++], len_rtx,
> +				   TYPE_MODE (TREE_TYPE (len)),
> +				   TYPE_UNSIGNED (TREE_TYPE (len)));
> +      tree mask = gimple_call_arg (stmt, 5);
> +      rtx mask_rtx = expand_normal (mask);
> +      create_input_operand (&ops[i++], mask_rtx, TYPE_MODE (TREE_TYPE (mask)));
> +    }
>    insn_code icode = convert_optab_handler (optab, TYPE_MODE (TREE_TYPE (lhs)),
>  					   TYPE_MODE (TREE_TYPE (offset)));
>    expand_insn (icode, i, ops);
> @@ -4434,6 +4453,7 @@ internal_load_fn_p (internal_fn fn)
>      case IFN_MASK_LOAD_LANES:
>      case IFN_GATHER_LOAD:
>      case IFN_MASK_GATHER_LOAD:
> +    case IFN_LEN_MASK_GATHER_LOAD:
>      case IFN_LEN_LOAD:
>      case IFN_LEN_MASK_LOAD:
>        return true;
> @@ -4455,6 +4475,7 @@ internal_store_fn_p (internal_fn fn)
>      case IFN_MASK_STORE_LANES:
>      case IFN_SCATTER_STORE:
>      case IFN_MASK_SCATTER_STORE:
> +    case IFN_LEN_MASK_SCATTER_STORE:
>      case IFN_LEN_STORE:
>      case IFN_LEN_MASK_STORE:
>        return true;
> @@ -4473,8 +4494,10 @@ internal_gather_scatter_fn_p (internal_fn fn)
>      {
>      case IFN_GATHER_LOAD:
>      case IFN_MASK_GATHER_LOAD:
> +    case IFN_LEN_MASK_GATHER_LOAD:
>      case IFN_SCATTER_STORE:
>      case IFN_MASK_SCATTER_STORE:
> +    case IFN_LEN_MASK_SCATTER_STORE:
>        return true;
>  
>      default:
> @@ -4504,6 +4527,10 @@ internal_fn_mask_index (internal_fn fn)
>      case IFN_LEN_MASK_STORE:
>        return 3;
>  
> +    case IFN_LEN_MASK_GATHER_LOAD:
> +    case IFN_LEN_MASK_SCATTER_STORE:
> +      return 5;
> +
>      default:
>        return (conditional_internal_fn_code (fn) != ERROR_MARK
>  	      || get_unconditional_internal_fn (fn) != IFN_LAST ? 0 : -1);
> @@ -4522,6 +4549,7 @@ internal_fn_stored_value_index (internal_fn fn)
>      case IFN_MASK_STORE_LANES:
>      case IFN_SCATTER_STORE:
>      case IFN_MASK_SCATTER_STORE:
> +    case IFN_LEN_MASK_SCATTER_STORE:
>      case IFN_LEN_STORE:
>        return 3;
>  
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index bc947c0fde7..5be24decf88 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -48,14 +48,14 @@ along with GCC; see the file COPYING3.  If not see
>     - mask_load: currently just maskload
>     - load_lanes: currently just vec_load_lanes
>     - mask_load_lanes: currently just vec_mask_load_lanes
> -   - gather_load: used for {mask_,}gather_load
> +   - gather_load: used for {mask_,len_mask_,}gather_load
>     - len_load: currently just len_load
>     - len_maskload: currently just len_maskload
>  
>     - mask_store: currently just maskstore
>     - store_lanes: currently just vec_store_lanes
>     - mask_store_lanes: currently just vec_mask_store_lanes
> -   - scatter_store: used for {mask_,}scatter_store
> +   - scatter_store: used for {mask_,len_mask_,}scatter_store
>     - len_store: currently just len_store
>     - len_maskstore: currently just len_maskstore
>  
> @@ -157,6 +157,8 @@ DEF_INTERNAL_OPTAB_FN (MASK_LOAD_LANES, ECF_PURE,
>  DEF_INTERNAL_OPTAB_FN (GATHER_LOAD, ECF_PURE, gather_load, gather_load)
>  DEF_INTERNAL_OPTAB_FN (MASK_GATHER_LOAD, ECF_PURE,
>  		       mask_gather_load, gather_load)
> +DEF_INTERNAL_OPTAB_FN (LEN_MASK_GATHER_LOAD, ECF_PURE,
> +		       len_mask_gather_load, gather_load)
>  
>  DEF_INTERNAL_OPTAB_FN (LEN_LOAD, ECF_PURE, len_load, len_load)
>  DEF_INTERNAL_OPTAB_FN (LEN_MASK_LOAD, ECF_PURE, len_maskload, len_maskload)
> @@ -164,6 +166,8 @@ DEF_INTERNAL_OPTAB_FN (LEN_MASK_LOAD, ECF_PURE, len_maskload, len_maskload)
>  DEF_INTERNAL_OPTAB_FN (SCATTER_STORE, 0, scatter_store, scatter_store)
>  DEF_INTERNAL_OPTAB_FN (MASK_SCATTER_STORE, 0,
>  		       mask_scatter_store, scatter_store)
> +DEF_INTERNAL_OPTAB_FN (LEN_MASK_SCATTER_STORE, 0,
> +		       len_mask_scatter_store, scatter_store)
>  
>  DEF_INTERNAL_OPTAB_FN (MASK_STORE, 0, maskstore, mask_store)
>  DEF_INTERNAL_OPTAB_FN (STORE_LANES, ECF_CONST, vec_store_lanes, store_lanes)
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index 9533eb11565..58933e61817 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -95,8 +95,10 @@ OPTAB_CD(len_maskload_optab, "len_maskload$a$b")
>  OPTAB_CD(len_maskstore_optab, "len_maskstore$a$b")
>  OPTAB_CD(gather_load_optab, "gather_load$a$b")
>  OPTAB_CD(mask_gather_load_optab, "mask_gather_load$a$b")
> +OPTAB_CD(len_mask_gather_load_optab, "len_mask_gather_load$a$b")
>  OPTAB_CD(scatter_store_optab, "scatter_store$a$b")
>  OPTAB_CD(mask_scatter_store_optab, "mask_scatter_store$a$b")
> +OPTAB_CD(len_mask_scatter_store_optab, "len_mask_scatter_store$a$b")
>  OPTAB_CD(vec_extract_optab, "vec_extract$a$b")
>  OPTAB_CD(vec_init_optab, "vec_init$a$b")
>  
>
  
juzhe.zhong@rivai.ai June 30, 2023, 6:51 a.m. UTC | #2
>>  I'll note
>> that 'len' has issues with scatter/gather anyway since the trick of
>> handling 'len' in bytes by only providing QImode variants doesn't work
>> here.

For RVV, we prefer ’len' in elements which perfectly match our RVV instruction ISA define.
For example, for our indexed load:

vsetvli ...len = 4
vluxei v0, (a5), v2, v0.t

In this situation, vluxei will gather load 4 elements, and each element addresses are:

element 0 address = a5 + v2[0]
element 1 address = a5 + v2[1]
element 2 address = a5 + v2[2]
element 3 address = a5 + v2[3]
 
I am not sure why you mention 'len' in bytes.

Thanks.


juzhe.zhong@rivai.ai
 
From: Richard Biener
Date: 2023-06-30 14:26
To: Ju-Zhe Zhong
CC: gcc-patches; richard.sandiford
Subject: Re: [PATCH V2] Machine Description: Add LEN_MASK_{GATHER_LOAD, SCATTER_STORE} pattern
On Fri, 30 Jun 2023, juzhe.zhong@rivai.ai wrote:
 
> From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
> 
> Hi, Richi and Richard.
> 
> This patch is adding LEN_MASK_{GATHER_LOAD,SCATTER_STORE} to allow targets
> handle flow control by mask and loop control by length on gather/scatter memory
> operations. Consider this following case:
> 
> #include <stdint.h>
> void
> f (uint8_t *restrict a, 
>    uint8_t *restrict b, int n,
>    int base, int step,
>    int *restrict cond)
> {
>   for (int i = 0; i < n; ++i)
>     {
>       if (cond[i])
>         a[i * step + base] = b[i * step + base];
>     }
> }
> 
> We hope RVV can vectorize such case into following IR:
> 
> loop_len = SELECT_VL
> control_mask = comparison
> v = LEN_MASK_GATHER_LOAD (.., loop_len, control_mask)
> LEN_SCATTER_STORE (... v, ..., loop_len, control_mask)
> 
> This patch doesn't apply such patterns into vectorizer, just add patterns
> and update the documents.
 
I see this doesn't add a BIAS argument - I think we should be consistent
here, at least for the memory access internal functions.  I'll note
that 'len' has issues with scatter/gather anyway since the trick of
handling 'len' in bytes by only providing QImode variants doesn't work
here.  So maybe that's good enough for an argument to not add bias
either ...
 
Richard, do you have any opinion here?
 
> Will send patch which apply such patterns into vectorizer soon after this
> patch is approved.
> 
> Thanks.
> 
> gcc/ChangeLog:
> 
>         * doc/md.texi: Add LEN_MASK_{GATHER_LOAD,SCATTER_STORE}.
>         * internal-fn.cc (expand_scatter_store_optab_fn): Ditto.
>         (expand_gather_load_optab_fn): Ditto.
>         (internal_load_fn_p): Ditto.
>         (internal_store_fn_p): Ditto.
>         (internal_gather_scatter_fn_p): Ditto.
>         (internal_fn_mask_index): Ditto.
 
Can you add internal_fn_len_index please and make use of it?
Should have asked for this in the previous patch already I guess.
 
Otherwise this looks good to me.
 
Thanks,
Richard.
 
>         (internal_fn_stored_value_index): Ditto.
>         * internal-fn.def (LEN_MASK_GATHER_LOAD): Ditto.
>         (LEN_MASK_SCATTER_STORE): Ditto.
>         * optabs.def (OPTAB_CD): Ditto.
> 
> ---
>  gcc/doc/md.texi     | 17 +++++++++++++++++
>  gcc/internal-fn.cc  | 32 ++++++++++++++++++++++++++++++--
>  gcc/internal-fn.def |  8 ++++++--
>  gcc/optabs.def      |  2 ++
>  4 files changed, 55 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 9648fdc846a..b84aaab7075 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5040,6 +5040,15 @@ operand 5.  Bit @var{i} of the mask is set if element @var{i}
>  of the result should be loaded from memory and clear if element @var{i}
>  of the result should be set to zero.
>  
> +@cindex @code{len_mask_gather_load@var{m}@var{n}} instruction pattern
> +@item @samp{len_mask_gather_load@var{m}@var{n}}
> +Like @samp{gather_load@var{m}@var{n}}, but takes an extra length operand (operand 5)
> +as well as a mask operand (operand 6). Similar to len_maskload, the instruction loads
> +at most (operand 5) elements from memory.
> +Bit @var{i} of the mask is set if element @var{i} of the result should
> +be loaded from memory and clear if element @var{i} of the result should be undefined.
> +Mask elements @var{i} with i > (operand 5) are ignored.
> +
>  @cindex @code{scatter_store@var{m}@var{n}} instruction pattern
>  @item @samp{scatter_store@var{m}@var{n}}
>  Store a vector of mode @var{m} into several distinct memory locations.
> @@ -5069,6 +5078,14 @@ Like @samp{scatter_store@var{m}@var{n}}, but takes an extra mask operand as
>  operand 5.  Bit @var{i} of the mask is set if element @var{i}
>  of the result should be stored to memory.
>  
> +@cindex @code{len_mask_scatter_store@var{m}@var{n}} instruction pattern
> +@item @samp{len_mask_scatter_store@var{m}@var{n}}
> +Like @samp{scatter_store@var{m}@var{n}}, but takes an extra length operand (operand 5)
> +as well as a mask operand (operand 6). The instruction stores at most (operand 5) elements
> +of (operand 4) to memory.
> +Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be stored.
> +Mask elements @var{i} with i > (operand 5) are ignored.
> +
>  @cindex @code{vec_set@var{m}} instruction pattern
>  @item @samp{vec_set@var{m}}
>  Set given field in the vector value.  Operand 0 is the vector to modify,
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 9017176dc7a..e4b558e33d8 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -3537,7 +3537,7 @@ expand_scatter_store_optab_fn (internal_fn, gcall *stmt, direct_optab optab)
>    HOST_WIDE_INT scale_int = tree_to_shwi (scale);
>    rtx rhs_rtx = expand_normal (rhs);
>  
> -  class expand_operand ops[6];
> +  class expand_operand ops[7];
>    int i = 0;
>    create_address_operand (&ops[i++], base_rtx);
>    create_input_operand (&ops[i++], offset_rtx, TYPE_MODE (TREE_TYPE (offset)));
> @@ -3546,6 +3546,14 @@ expand_scatter_store_optab_fn (internal_fn, gcall *stmt, direct_optab optab)
>    create_input_operand (&ops[i++], rhs_rtx, TYPE_MODE (TREE_TYPE (rhs)));
>    if (mask_index >= 0)
>      {
> +      if (optab == len_mask_scatter_store_optab)
> + {
> +   tree len = gimple_call_arg (stmt, mask_index - 1);
> +   rtx len_rtx = expand_normal (len);
> +   create_convert_operand_from (&ops[i++], len_rtx,
> +        TYPE_MODE (TREE_TYPE (len)),
> +        TYPE_UNSIGNED (TREE_TYPE (len)));
> + }
>        tree mask = gimple_call_arg (stmt, mask_index);
>        rtx mask_rtx = expand_normal (mask);
>        create_input_operand (&ops[i++], mask_rtx, TYPE_MODE (TREE_TYPE (mask)));
> @@ -3572,7 +3580,7 @@ expand_gather_load_optab_fn (internal_fn, gcall *stmt, direct_optab optab)
>    HOST_WIDE_INT scale_int = tree_to_shwi (scale);
>  
>    int i = 0;
> -  class expand_operand ops[6];
> +  class expand_operand ops[7];
>    create_output_operand (&ops[i++], lhs_rtx, TYPE_MODE (TREE_TYPE (lhs)));
>    create_address_operand (&ops[i++], base_rtx);
>    create_input_operand (&ops[i++], offset_rtx, TYPE_MODE (TREE_TYPE (offset)));
> @@ -3584,6 +3592,17 @@ expand_gather_load_optab_fn (internal_fn, gcall *stmt, direct_optab optab)
>        rtx mask_rtx = expand_normal (mask);
>        create_input_operand (&ops[i++], mask_rtx, TYPE_MODE (TREE_TYPE (mask)));
>      }
> +  else if (optab == len_mask_gather_load_optab)
> +    {
> +      tree len = gimple_call_arg (stmt, 4);
> +      rtx len_rtx = expand_normal (len);
> +      create_convert_operand_from (&ops[i++], len_rtx,
> +    TYPE_MODE (TREE_TYPE (len)),
> +    TYPE_UNSIGNED (TREE_TYPE (len)));
> +      tree mask = gimple_call_arg (stmt, 5);
> +      rtx mask_rtx = expand_normal (mask);
> +      create_input_operand (&ops[i++], mask_rtx, TYPE_MODE (TREE_TYPE (mask)));
> +    }
>    insn_code icode = convert_optab_handler (optab, TYPE_MODE (TREE_TYPE (lhs)),
>     TYPE_MODE (TREE_TYPE (offset)));
>    expand_insn (icode, i, ops);
> @@ -4434,6 +4453,7 @@ internal_load_fn_p (internal_fn fn)
>      case IFN_MASK_LOAD_LANES:
>      case IFN_GATHER_LOAD:
>      case IFN_MASK_GATHER_LOAD:
> +    case IFN_LEN_MASK_GATHER_LOAD:
>      case IFN_LEN_LOAD:
>      case IFN_LEN_MASK_LOAD:
>        return true;
> @@ -4455,6 +4475,7 @@ internal_store_fn_p (internal_fn fn)
>      case IFN_MASK_STORE_LANES:
>      case IFN_SCATTER_STORE:
>      case IFN_MASK_SCATTER_STORE:
> +    case IFN_LEN_MASK_SCATTER_STORE:
>      case IFN_LEN_STORE:
>      case IFN_LEN_MASK_STORE:
>        return true;
> @@ -4473,8 +4494,10 @@ internal_gather_scatter_fn_p (internal_fn fn)
>      {
>      case IFN_GATHER_LOAD:
>      case IFN_MASK_GATHER_LOAD:
> +    case IFN_LEN_MASK_GATHER_LOAD:
>      case IFN_SCATTER_STORE:
>      case IFN_MASK_SCATTER_STORE:
> +    case IFN_LEN_MASK_SCATTER_STORE:
>        return true;
>  
>      default:
> @@ -4504,6 +4527,10 @@ internal_fn_mask_index (internal_fn fn)
>      case IFN_LEN_MASK_STORE:
>        return 3;
>  
> +    case IFN_LEN_MASK_GATHER_LOAD:
> +    case IFN_LEN_MASK_SCATTER_STORE:
> +      return 5;
> +
>      default:
>        return (conditional_internal_fn_code (fn) != ERROR_MARK
>        || get_unconditional_internal_fn (fn) != IFN_LAST ? 0 : -1);
> @@ -4522,6 +4549,7 @@ internal_fn_stored_value_index (internal_fn fn)
>      case IFN_MASK_STORE_LANES:
>      case IFN_SCATTER_STORE:
>      case IFN_MASK_SCATTER_STORE:
> +    case IFN_LEN_MASK_SCATTER_STORE:
>      case IFN_LEN_STORE:
>        return 3;
>  
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index bc947c0fde7..5be24decf88 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -48,14 +48,14 @@ along with GCC; see the file COPYING3.  If not see
>     - mask_load: currently just maskload
>     - load_lanes: currently just vec_load_lanes
>     - mask_load_lanes: currently just vec_mask_load_lanes
> -   - gather_load: used for {mask_,}gather_load
> +   - gather_load: used for {mask_,len_mask_,}gather_load
>     - len_load: currently just len_load
>     - len_maskload: currently just len_maskload
>  
>     - mask_store: currently just maskstore
>     - store_lanes: currently just vec_store_lanes
>     - mask_store_lanes: currently just vec_mask_store_lanes
> -   - scatter_store: used for {mask_,}scatter_store
> +   - scatter_store: used for {mask_,len_mask_,}scatter_store
>     - len_store: currently just len_store
>     - len_maskstore: currently just len_maskstore
>  
> @@ -157,6 +157,8 @@ DEF_INTERNAL_OPTAB_FN (MASK_LOAD_LANES, ECF_PURE,
>  DEF_INTERNAL_OPTAB_FN (GATHER_LOAD, ECF_PURE, gather_load, gather_load)
>  DEF_INTERNAL_OPTAB_FN (MASK_GATHER_LOAD, ECF_PURE,
>         mask_gather_load, gather_load)
> +DEF_INTERNAL_OPTAB_FN (LEN_MASK_GATHER_LOAD, ECF_PURE,
> +        len_mask_gather_load, gather_load)
>  
>  DEF_INTERNAL_OPTAB_FN (LEN_LOAD, ECF_PURE, len_load, len_load)
>  DEF_INTERNAL_OPTAB_FN (LEN_MASK_LOAD, ECF_PURE, len_maskload, len_maskload)
> @@ -164,6 +166,8 @@ DEF_INTERNAL_OPTAB_FN (LEN_MASK_LOAD, ECF_PURE, len_maskload, len_maskload)
>  DEF_INTERNAL_OPTAB_FN (SCATTER_STORE, 0, scatter_store, scatter_store)
>  DEF_INTERNAL_OPTAB_FN (MASK_SCATTER_STORE, 0,
>         mask_scatter_store, scatter_store)
> +DEF_INTERNAL_OPTAB_FN (LEN_MASK_SCATTER_STORE, 0,
> +        len_mask_scatter_store, scatter_store)
>  
>  DEF_INTERNAL_OPTAB_FN (MASK_STORE, 0, maskstore, mask_store)
>  DEF_INTERNAL_OPTAB_FN (STORE_LANES, ECF_CONST, vec_store_lanes, store_lanes)
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index 9533eb11565..58933e61817 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -95,8 +95,10 @@ OPTAB_CD(len_maskload_optab, "len_maskload$a$b")
>  OPTAB_CD(len_maskstore_optab, "len_maskstore$a$b")
>  OPTAB_CD(gather_load_optab, "gather_load$a$b")
>  OPTAB_CD(mask_gather_load_optab, "mask_gather_load$a$b")
> +OPTAB_CD(len_mask_gather_load_optab, "len_mask_gather_load$a$b")
>  OPTAB_CD(scatter_store_optab, "scatter_store$a$b")
>  OPTAB_CD(mask_scatter_store_optab, "mask_scatter_store$a$b")
> +OPTAB_CD(len_mask_scatter_store_optab, "len_mask_scatter_store$a$b")
>  OPTAB_CD(vec_extract_optab, "vec_extract$a$b")
>  OPTAB_CD(vec_init_optab, "vec_init$a$b")
>  
> 
 
-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)
  
Robin Dapp June 30, 2023, 6:55 a.m. UTC | #3
> I am not sure why you mention 'len' in bytes.

The 'trick' for then len_load/len_store patterns is
to allow a QImode/byte-only length rather than elements.

Regards
 Robin
  
juzhe.zhong@rivai.ai June 30, 2023, 7:08 a.m. UTC | #4
Hi, Richi. I have added "BIAS" and send V4:
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/623293.html 

Forget about V3. I made a mistake there, sorry about that.

Thanks.


juzhe.zhong@rivai.ai
 
From: Richard Biener
Date: 2023-06-30 14:26
To: Ju-Zhe Zhong
CC: gcc-patches; richard.sandiford
Subject: Re: [PATCH V2] Machine Description: Add LEN_MASK_{GATHER_LOAD, SCATTER_STORE} pattern
On Fri, 30 Jun 2023, juzhe.zhong@rivai.ai wrote:
 
> From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
> 
> Hi, Richi and Richard.
> 
> This patch is adding LEN_MASK_{GATHER_LOAD,SCATTER_STORE} to allow targets
> handle flow control by mask and loop control by length on gather/scatter memory
> operations. Consider this following case:
> 
> #include <stdint.h>
> void
> f (uint8_t *restrict a, 
>    uint8_t *restrict b, int n,
>    int base, int step,
>    int *restrict cond)
> {
>   for (int i = 0; i < n; ++i)
>     {
>       if (cond[i])
>         a[i * step + base] = b[i * step + base];
>     }
> }
> 
> We hope RVV can vectorize such case into following IR:
> 
> loop_len = SELECT_VL
> control_mask = comparison
> v = LEN_MASK_GATHER_LOAD (.., loop_len, control_mask)
> LEN_SCATTER_STORE (... v, ..., loop_len, control_mask)
> 
> This patch doesn't apply such patterns into vectorizer, just add patterns
> and update the documents.
 
I see this doesn't add a BIAS argument - I think we should be consistent
here, at least for the memory access internal functions.  I'll note
that 'len' has issues with scatter/gather anyway since the trick of
handling 'len' in bytes by only providing QImode variants doesn't work
here.  So maybe that's good enough for an argument to not add bias
either ...
 
Richard, do you have any opinion here?
 
> Will send patch which apply such patterns into vectorizer soon after this
> patch is approved.
> 
> Thanks.
> 
> gcc/ChangeLog:
> 
>         * doc/md.texi: Add LEN_MASK_{GATHER_LOAD,SCATTER_STORE}.
>         * internal-fn.cc (expand_scatter_store_optab_fn): Ditto.
>         (expand_gather_load_optab_fn): Ditto.
>         (internal_load_fn_p): Ditto.
>         (internal_store_fn_p): Ditto.
>         (internal_gather_scatter_fn_p): Ditto.
>         (internal_fn_mask_index): Ditto.
 
Can you add internal_fn_len_index please and make use of it?
Should have asked for this in the previous patch already I guess.
 
Otherwise this looks good to me.
 
Thanks,
Richard.
 
>         (internal_fn_stored_value_index): Ditto.
>         * internal-fn.def (LEN_MASK_GATHER_LOAD): Ditto.
>         (LEN_MASK_SCATTER_STORE): Ditto.
>         * optabs.def (OPTAB_CD): Ditto.
> 
> ---
>  gcc/doc/md.texi     | 17 +++++++++++++++++
>  gcc/internal-fn.cc  | 32 ++++++++++++++++++++++++++++++--
>  gcc/internal-fn.def |  8 ++++++--
>  gcc/optabs.def      |  2 ++
>  4 files changed, 55 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 9648fdc846a..b84aaab7075 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5040,6 +5040,15 @@ operand 5.  Bit @var{i} of the mask is set if element @var{i}
>  of the result should be loaded from memory and clear if element @var{i}
>  of the result should be set to zero.
>  
> +@cindex @code{len_mask_gather_load@var{m}@var{n}} instruction pattern
> +@item @samp{len_mask_gather_load@var{m}@var{n}}
> +Like @samp{gather_load@var{m}@var{n}}, but takes an extra length operand (operand 5)
> +as well as a mask operand (operand 6). Similar to len_maskload, the instruction loads
> +at most (operand 5) elements from memory.
> +Bit @var{i} of the mask is set if element @var{i} of the result should
> +be loaded from memory and clear if element @var{i} of the result should be undefined.
> +Mask elements @var{i} with i > (operand 5) are ignored.
> +
>  @cindex @code{scatter_store@var{m}@var{n}} instruction pattern
>  @item @samp{scatter_store@var{m}@var{n}}
>  Store a vector of mode @var{m} into several distinct memory locations.
> @@ -5069,6 +5078,14 @@ Like @samp{scatter_store@var{m}@var{n}}, but takes an extra mask operand as
>  operand 5.  Bit @var{i} of the mask is set if element @var{i}
>  of the result should be stored to memory.
>  
> +@cindex @code{len_mask_scatter_store@var{m}@var{n}} instruction pattern
> +@item @samp{len_mask_scatter_store@var{m}@var{n}}
> +Like @samp{scatter_store@var{m}@var{n}}, but takes an extra length operand (operand 5)
> +as well as a mask operand (operand 6). The instruction stores at most (operand 5) elements
> +of (operand 4) to memory.
> +Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be stored.
> +Mask elements @var{i} with i > (operand 5) are ignored.
> +
>  @cindex @code{vec_set@var{m}} instruction pattern
>  @item @samp{vec_set@var{m}}
>  Set given field in the vector value.  Operand 0 is the vector to modify,
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 9017176dc7a..e4b558e33d8 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -3537,7 +3537,7 @@ expand_scatter_store_optab_fn (internal_fn, gcall *stmt, direct_optab optab)
>    HOST_WIDE_INT scale_int = tree_to_shwi (scale);
>    rtx rhs_rtx = expand_normal (rhs);
>  
> -  class expand_operand ops[6];
> +  class expand_operand ops[7];
>    int i = 0;
>    create_address_operand (&ops[i++], base_rtx);
>    create_input_operand (&ops[i++], offset_rtx, TYPE_MODE (TREE_TYPE (offset)));
> @@ -3546,6 +3546,14 @@ expand_scatter_store_optab_fn (internal_fn, gcall *stmt, direct_optab optab)
>    create_input_operand (&ops[i++], rhs_rtx, TYPE_MODE (TREE_TYPE (rhs)));
>    if (mask_index >= 0)
>      {
> +      if (optab == len_mask_scatter_store_optab)
> + {
> +   tree len = gimple_call_arg (stmt, mask_index - 1);
> +   rtx len_rtx = expand_normal (len);
> +   create_convert_operand_from (&ops[i++], len_rtx,
> +        TYPE_MODE (TREE_TYPE (len)),
> +        TYPE_UNSIGNED (TREE_TYPE (len)));
> + }
>        tree mask = gimple_call_arg (stmt, mask_index);
>        rtx mask_rtx = expand_normal (mask);
>        create_input_operand (&ops[i++], mask_rtx, TYPE_MODE (TREE_TYPE (mask)));
> @@ -3572,7 +3580,7 @@ expand_gather_load_optab_fn (internal_fn, gcall *stmt, direct_optab optab)
>    HOST_WIDE_INT scale_int = tree_to_shwi (scale);
>  
>    int i = 0;
> -  class expand_operand ops[6];
> +  class expand_operand ops[7];
>    create_output_operand (&ops[i++], lhs_rtx, TYPE_MODE (TREE_TYPE (lhs)));
>    create_address_operand (&ops[i++], base_rtx);
>    create_input_operand (&ops[i++], offset_rtx, TYPE_MODE (TREE_TYPE (offset)));
> @@ -3584,6 +3592,17 @@ expand_gather_load_optab_fn (internal_fn, gcall *stmt, direct_optab optab)
>        rtx mask_rtx = expand_normal (mask);
>        create_input_operand (&ops[i++], mask_rtx, TYPE_MODE (TREE_TYPE (mask)));
>      }
> +  else if (optab == len_mask_gather_load_optab)
> +    {
> +      tree len = gimple_call_arg (stmt, 4);
> +      rtx len_rtx = expand_normal (len);
> +      create_convert_operand_from (&ops[i++], len_rtx,
> +    TYPE_MODE (TREE_TYPE (len)),
> +    TYPE_UNSIGNED (TREE_TYPE (len)));
> +      tree mask = gimple_call_arg (stmt, 5);
> +      rtx mask_rtx = expand_normal (mask);
> +      create_input_operand (&ops[i++], mask_rtx, TYPE_MODE (TREE_TYPE (mask)));
> +    }
>    insn_code icode = convert_optab_handler (optab, TYPE_MODE (TREE_TYPE (lhs)),
>     TYPE_MODE (TREE_TYPE (offset)));
>    expand_insn (icode, i, ops);
> @@ -4434,6 +4453,7 @@ internal_load_fn_p (internal_fn fn)
>      case IFN_MASK_LOAD_LANES:
>      case IFN_GATHER_LOAD:
>      case IFN_MASK_GATHER_LOAD:
> +    case IFN_LEN_MASK_GATHER_LOAD:
>      case IFN_LEN_LOAD:
>      case IFN_LEN_MASK_LOAD:
>        return true;
> @@ -4455,6 +4475,7 @@ internal_store_fn_p (internal_fn fn)
>      case IFN_MASK_STORE_LANES:
>      case IFN_SCATTER_STORE:
>      case IFN_MASK_SCATTER_STORE:
> +    case IFN_LEN_MASK_SCATTER_STORE:
>      case IFN_LEN_STORE:
>      case IFN_LEN_MASK_STORE:
>        return true;
> @@ -4473,8 +4494,10 @@ internal_gather_scatter_fn_p (internal_fn fn)
>      {
>      case IFN_GATHER_LOAD:
>      case IFN_MASK_GATHER_LOAD:
> +    case IFN_LEN_MASK_GATHER_LOAD:
>      case IFN_SCATTER_STORE:
>      case IFN_MASK_SCATTER_STORE:
> +    case IFN_LEN_MASK_SCATTER_STORE:
>        return true;
>  
>      default:
> @@ -4504,6 +4527,10 @@ internal_fn_mask_index (internal_fn fn)
>      case IFN_LEN_MASK_STORE:
>        return 3;
>  
> +    case IFN_LEN_MASK_GATHER_LOAD:
> +    case IFN_LEN_MASK_SCATTER_STORE:
> +      return 5;
> +
>      default:
>        return (conditional_internal_fn_code (fn) != ERROR_MARK
>        || get_unconditional_internal_fn (fn) != IFN_LAST ? 0 : -1);
> @@ -4522,6 +4549,7 @@ internal_fn_stored_value_index (internal_fn fn)
>      case IFN_MASK_STORE_LANES:
>      case IFN_SCATTER_STORE:
>      case IFN_MASK_SCATTER_STORE:
> +    case IFN_LEN_MASK_SCATTER_STORE:
>      case IFN_LEN_STORE:
>        return 3;
>  
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index bc947c0fde7..5be24decf88 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -48,14 +48,14 @@ along with GCC; see the file COPYING3.  If not see
>     - mask_load: currently just maskload
>     - load_lanes: currently just vec_load_lanes
>     - mask_load_lanes: currently just vec_mask_load_lanes
> -   - gather_load: used for {mask_,}gather_load
> +   - gather_load: used for {mask_,len_mask_,}gather_load
>     - len_load: currently just len_load
>     - len_maskload: currently just len_maskload
>  
>     - mask_store: currently just maskstore
>     - store_lanes: currently just vec_store_lanes
>     - mask_store_lanes: currently just vec_mask_store_lanes
> -   - scatter_store: used for {mask_,}scatter_store
> +   - scatter_store: used for {mask_,len_mask_,}scatter_store
>     - len_store: currently just len_store
>     - len_maskstore: currently just len_maskstore
>  
> @@ -157,6 +157,8 @@ DEF_INTERNAL_OPTAB_FN (MASK_LOAD_LANES, ECF_PURE,
>  DEF_INTERNAL_OPTAB_FN (GATHER_LOAD, ECF_PURE, gather_load, gather_load)
>  DEF_INTERNAL_OPTAB_FN (MASK_GATHER_LOAD, ECF_PURE,
>         mask_gather_load, gather_load)
> +DEF_INTERNAL_OPTAB_FN (LEN_MASK_GATHER_LOAD, ECF_PURE,
> +        len_mask_gather_load, gather_load)
>  
>  DEF_INTERNAL_OPTAB_FN (LEN_LOAD, ECF_PURE, len_load, len_load)
>  DEF_INTERNAL_OPTAB_FN (LEN_MASK_LOAD, ECF_PURE, len_maskload, len_maskload)
> @@ -164,6 +166,8 @@ DEF_INTERNAL_OPTAB_FN (LEN_MASK_LOAD, ECF_PURE, len_maskload, len_maskload)
>  DEF_INTERNAL_OPTAB_FN (SCATTER_STORE, 0, scatter_store, scatter_store)
>  DEF_INTERNAL_OPTAB_FN (MASK_SCATTER_STORE, 0,
>         mask_scatter_store, scatter_store)
> +DEF_INTERNAL_OPTAB_FN (LEN_MASK_SCATTER_STORE, 0,
> +        len_mask_scatter_store, scatter_store)
>  
>  DEF_INTERNAL_OPTAB_FN (MASK_STORE, 0, maskstore, mask_store)
>  DEF_INTERNAL_OPTAB_FN (STORE_LANES, ECF_CONST, vec_store_lanes, store_lanes)
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index 9533eb11565..58933e61817 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -95,8 +95,10 @@ OPTAB_CD(len_maskload_optab, "len_maskload$a$b")
>  OPTAB_CD(len_maskstore_optab, "len_maskstore$a$b")
>  OPTAB_CD(gather_load_optab, "gather_load$a$b")
>  OPTAB_CD(mask_gather_load_optab, "mask_gather_load$a$b")
> +OPTAB_CD(len_mask_gather_load_optab, "len_mask_gather_load$a$b")
>  OPTAB_CD(scatter_store_optab, "scatter_store$a$b")
>  OPTAB_CD(mask_scatter_store_optab, "mask_scatter_store$a$b")
> +OPTAB_CD(len_mask_scatter_store_optab, "len_mask_scatter_store$a$b")
>  OPTAB_CD(vec_extract_optab, "vec_extract$a$b")
>  OPTAB_CD(vec_init_optab, "vec_init$a$b")
>  
> 
 
-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)
  

Patch

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 9648fdc846a..b84aaab7075 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5040,6 +5040,15 @@  operand 5.  Bit @var{i} of the mask is set if element @var{i}
 of the result should be loaded from memory and clear if element @var{i}
 of the result should be set to zero.
 
+@cindex @code{len_mask_gather_load@var{m}@var{n}} instruction pattern
+@item @samp{len_mask_gather_load@var{m}@var{n}}
+Like @samp{gather_load@var{m}@var{n}}, but takes an extra length operand (operand 5)
+as well as a mask operand (operand 6). Similar to len_maskload, the instruction loads
+at most (operand 5) elements from memory.
+Bit @var{i} of the mask is set if element @var{i} of the result should
+be loaded from memory and clear if element @var{i} of the result should be undefined.
+Mask elements @var{i} with i > (operand 5) are ignored.
+
 @cindex @code{scatter_store@var{m}@var{n}} instruction pattern
 @item @samp{scatter_store@var{m}@var{n}}
 Store a vector of mode @var{m} into several distinct memory locations.
@@ -5069,6 +5078,14 @@  Like @samp{scatter_store@var{m}@var{n}}, but takes an extra mask operand as
 operand 5.  Bit @var{i} of the mask is set if element @var{i}
 of the result should be stored to memory.
 
+@cindex @code{len_mask_scatter_store@var{m}@var{n}} instruction pattern
+@item @samp{len_mask_scatter_store@var{m}@var{n}}
+Like @samp{scatter_store@var{m}@var{n}}, but takes an extra length operand (operand 5)
+as well as a mask operand (operand 6). The instruction stores at most (operand 5) elements
+of (operand 4) to memory.
+Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be stored.
+Mask elements @var{i} with i > (operand 5) are ignored.
+
 @cindex @code{vec_set@var{m}} instruction pattern
 @item @samp{vec_set@var{m}}
 Set given field in the vector value.  Operand 0 is the vector to modify,
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 9017176dc7a..e4b558e33d8 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -3537,7 +3537,7 @@  expand_scatter_store_optab_fn (internal_fn, gcall *stmt, direct_optab optab)
   HOST_WIDE_INT scale_int = tree_to_shwi (scale);
   rtx rhs_rtx = expand_normal (rhs);
 
-  class expand_operand ops[6];
+  class expand_operand ops[7];
   int i = 0;
   create_address_operand (&ops[i++], base_rtx);
   create_input_operand (&ops[i++], offset_rtx, TYPE_MODE (TREE_TYPE (offset)));
@@ -3546,6 +3546,14 @@  expand_scatter_store_optab_fn (internal_fn, gcall *stmt, direct_optab optab)
   create_input_operand (&ops[i++], rhs_rtx, TYPE_MODE (TREE_TYPE (rhs)));
   if (mask_index >= 0)
     {
+      if (optab == len_mask_scatter_store_optab)
+	{
+	  tree len = gimple_call_arg (stmt, mask_index - 1);
+	  rtx len_rtx = expand_normal (len);
+	  create_convert_operand_from (&ops[i++], len_rtx,
+				       TYPE_MODE (TREE_TYPE (len)),
+				       TYPE_UNSIGNED (TREE_TYPE (len)));
+	}
       tree mask = gimple_call_arg (stmt, mask_index);
       rtx mask_rtx = expand_normal (mask);
       create_input_operand (&ops[i++], mask_rtx, TYPE_MODE (TREE_TYPE (mask)));
@@ -3572,7 +3580,7 @@  expand_gather_load_optab_fn (internal_fn, gcall *stmt, direct_optab optab)
   HOST_WIDE_INT scale_int = tree_to_shwi (scale);
 
   int i = 0;
-  class expand_operand ops[6];
+  class expand_operand ops[7];
   create_output_operand (&ops[i++], lhs_rtx, TYPE_MODE (TREE_TYPE (lhs)));
   create_address_operand (&ops[i++], base_rtx);
   create_input_operand (&ops[i++], offset_rtx, TYPE_MODE (TREE_TYPE (offset)));
@@ -3584,6 +3592,17 @@  expand_gather_load_optab_fn (internal_fn, gcall *stmt, direct_optab optab)
       rtx mask_rtx = expand_normal (mask);
       create_input_operand (&ops[i++], mask_rtx, TYPE_MODE (TREE_TYPE (mask)));
     }
+  else if (optab == len_mask_gather_load_optab)
+    {
+      tree len = gimple_call_arg (stmt, 4);
+      rtx len_rtx = expand_normal (len);
+      create_convert_operand_from (&ops[i++], len_rtx,
+				   TYPE_MODE (TREE_TYPE (len)),
+				   TYPE_UNSIGNED (TREE_TYPE (len)));
+      tree mask = gimple_call_arg (stmt, 5);
+      rtx mask_rtx = expand_normal (mask);
+      create_input_operand (&ops[i++], mask_rtx, TYPE_MODE (TREE_TYPE (mask)));
+    }
   insn_code icode = convert_optab_handler (optab, TYPE_MODE (TREE_TYPE (lhs)),
 					   TYPE_MODE (TREE_TYPE (offset)));
   expand_insn (icode, i, ops);
@@ -4434,6 +4453,7 @@  internal_load_fn_p (internal_fn fn)
     case IFN_MASK_LOAD_LANES:
     case IFN_GATHER_LOAD:
     case IFN_MASK_GATHER_LOAD:
+    case IFN_LEN_MASK_GATHER_LOAD:
     case IFN_LEN_LOAD:
     case IFN_LEN_MASK_LOAD:
       return true;
@@ -4455,6 +4475,7 @@  internal_store_fn_p (internal_fn fn)
     case IFN_MASK_STORE_LANES:
     case IFN_SCATTER_STORE:
     case IFN_MASK_SCATTER_STORE:
+    case IFN_LEN_MASK_SCATTER_STORE:
     case IFN_LEN_STORE:
     case IFN_LEN_MASK_STORE:
       return true;
@@ -4473,8 +4494,10 @@  internal_gather_scatter_fn_p (internal_fn fn)
     {
     case IFN_GATHER_LOAD:
     case IFN_MASK_GATHER_LOAD:
+    case IFN_LEN_MASK_GATHER_LOAD:
     case IFN_SCATTER_STORE:
     case IFN_MASK_SCATTER_STORE:
+    case IFN_LEN_MASK_SCATTER_STORE:
       return true;
 
     default:
@@ -4504,6 +4527,10 @@  internal_fn_mask_index (internal_fn fn)
     case IFN_LEN_MASK_STORE:
       return 3;
 
+    case IFN_LEN_MASK_GATHER_LOAD:
+    case IFN_LEN_MASK_SCATTER_STORE:
+      return 5;
+
     default:
       return (conditional_internal_fn_code (fn) != ERROR_MARK
 	      || get_unconditional_internal_fn (fn) != IFN_LAST ? 0 : -1);
@@ -4522,6 +4549,7 @@  internal_fn_stored_value_index (internal_fn fn)
     case IFN_MASK_STORE_LANES:
     case IFN_SCATTER_STORE:
     case IFN_MASK_SCATTER_STORE:
+    case IFN_LEN_MASK_SCATTER_STORE:
     case IFN_LEN_STORE:
       return 3;
 
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index bc947c0fde7..5be24decf88 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -48,14 +48,14 @@  along with GCC; see the file COPYING3.  If not see
    - mask_load: currently just maskload
    - load_lanes: currently just vec_load_lanes
    - mask_load_lanes: currently just vec_mask_load_lanes
-   - gather_load: used for {mask_,}gather_load
+   - gather_load: used for {mask_,len_mask_,}gather_load
    - len_load: currently just len_load
    - len_maskload: currently just len_maskload
 
    - mask_store: currently just maskstore
    - store_lanes: currently just vec_store_lanes
    - mask_store_lanes: currently just vec_mask_store_lanes
-   - scatter_store: used for {mask_,}scatter_store
+   - scatter_store: used for {mask_,len_mask_,}scatter_store
    - len_store: currently just len_store
    - len_maskstore: currently just len_maskstore
 
@@ -157,6 +157,8 @@  DEF_INTERNAL_OPTAB_FN (MASK_LOAD_LANES, ECF_PURE,
 DEF_INTERNAL_OPTAB_FN (GATHER_LOAD, ECF_PURE, gather_load, gather_load)
 DEF_INTERNAL_OPTAB_FN (MASK_GATHER_LOAD, ECF_PURE,
 		       mask_gather_load, gather_load)
+DEF_INTERNAL_OPTAB_FN (LEN_MASK_GATHER_LOAD, ECF_PURE,
+		       len_mask_gather_load, gather_load)
 
 DEF_INTERNAL_OPTAB_FN (LEN_LOAD, ECF_PURE, len_load, len_load)
 DEF_INTERNAL_OPTAB_FN (LEN_MASK_LOAD, ECF_PURE, len_maskload, len_maskload)
@@ -164,6 +166,8 @@  DEF_INTERNAL_OPTAB_FN (LEN_MASK_LOAD, ECF_PURE, len_maskload, len_maskload)
 DEF_INTERNAL_OPTAB_FN (SCATTER_STORE, 0, scatter_store, scatter_store)
 DEF_INTERNAL_OPTAB_FN (MASK_SCATTER_STORE, 0,
 		       mask_scatter_store, scatter_store)
+DEF_INTERNAL_OPTAB_FN (LEN_MASK_SCATTER_STORE, 0,
+		       len_mask_scatter_store, scatter_store)
 
 DEF_INTERNAL_OPTAB_FN (MASK_STORE, 0, maskstore, mask_store)
 DEF_INTERNAL_OPTAB_FN (STORE_LANES, ECF_CONST, vec_store_lanes, store_lanes)
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 9533eb11565..58933e61817 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -95,8 +95,10 @@  OPTAB_CD(len_maskload_optab, "len_maskload$a$b")
 OPTAB_CD(len_maskstore_optab, "len_maskstore$a$b")
 OPTAB_CD(gather_load_optab, "gather_load$a$b")
 OPTAB_CD(mask_gather_load_optab, "mask_gather_load$a$b")
+OPTAB_CD(len_mask_gather_load_optab, "len_mask_gather_load$a$b")
 OPTAB_CD(scatter_store_optab, "scatter_store$a$b")
 OPTAB_CD(mask_scatter_store_optab, "mask_scatter_store$a$b")
+OPTAB_CD(len_mask_scatter_store_optab, "len_mask_scatter_store$a$b")
 OPTAB_CD(vec_extract_optab, "vec_extract$a$b")
 OPTAB_CD(vec_init_optab, "vec_init$a$b")