vect: while_ult for integer mask

Message ID 87180de9-d0d4-b92f-405f-100aca3d5cf8@codesourcery.com
State Accepted, archived
Headers
Series vect: while_ult for integer mask |

Checks

Context Check Description
snail/gcc-patches-check success Github commit url

Commit Message

Andrew Stubbs Sept. 28, 2022, 3:05 p.m. UTC
  This patch is a prerequisite for some amdgcn patches I'm working on to 
support shorter vector lengths (having fixed 64 lanes tends to miss 
optimizations, and masking is not supported everywhere yet).

The problem is that, unlike AArch64, I'm not using different mask modes 
for different sized vectors, so all loops end up using the while_ultsidi 
pattern, regardless of vector length.  In theory I could use SImode for 
V32, HImode for V16, etc., but there's no mode to fit V4 or V2 so 
something else is needed.  Moving to using vector masks in the backend 
is not a natural fit for GCN, and would be a huge task in any case.

This patch adds an additional length operand so that we can distinguish 
the different uses in the back end and don't end up with more lanes 
enabled than there ought to be.

I've made the extra operand conditional on the mode so that I don't have 
to modify the AArch64 backend; that uses while_<cond> family of 
operators in a lot of places and uses iterators, so it would end up 
touching a lot of code just to add an inactive operand, plus I don't 
have a way to test it properly.  I've confirmed that AArch64 builds and 
expands while_ult correctly in a simple example.

OK for mainline?

Thanks

Andrew
vect: while_ult for integer masks

Add a vector length parameter needed by amdgcn without breaking aarch64.

All amdgcn vector masks are DImode, regardless of vector length, so we can't
tell what length is implied simply from the operator mode.  (Even if we used
different integer modes there's no mode small enough to differenciate a 2 or
4 lane mask).  Without knowing the intended length we end up using a mask with
too many lanes enabled, which leads to undefined behaviour..

The extra operand is not added for vector mask types so AArch64 does not need
to be adjusted.

gcc/ChangeLog:

	* config/gcn/gcn-valu.md (while_ultsidi): Limit mask length using
	operand 3.
	* doc/md.texi (while_ult): Document new operand 3 usage.
	* internal-fn.cc (expand_while_optab_fn): Set operand 3 when lhs_type
	maps to a non-vector mode.
  

Comments

Richard Biener Sept. 29, 2022, 7:52 a.m. UTC | #1
On Wed, Sep 28, 2022 at 5:06 PM Andrew Stubbs <ams@codesourcery.com> wrote:
>
> This patch is a prerequisite for some amdgcn patches I'm working on to
> support shorter vector lengths (having fixed 64 lanes tends to miss
> optimizations, and masking is not supported everywhere yet).
>
> The problem is that, unlike AArch64, I'm not using different mask modes
> for different sized vectors, so all loops end up using the while_ultsidi
> pattern, regardless of vector length.  In theory I could use SImode for
> V32, HImode for V16, etc., but there's no mode to fit V4 or V2 so
> something else is needed.  Moving to using vector masks in the backend
> is not a natural fit for GCN, and would be a huge task in any case.
>
> This patch adds an additional length operand so that we can distinguish
> the different uses in the back end and don't end up with more lanes
> enabled than there ought to be.
>
> I've made the extra operand conditional on the mode so that I don't have
> to modify the AArch64 backend; that uses while_<cond> family of
> operators in a lot of places and uses iterators, so it would end up
> touching a lot of code just to add an inactive operand, plus I don't
> have a way to test it properly.  I've confirmed that AArch64 builds and
> expands while_ult correctly in a simple example.
>
> OK for mainline?

Hmm, but you could introduce BI4mode and BI2mode for V4 and V2, no?
Not sure if it is possible to have two partial integer modes and use those.

Richard.

>
> Thanks
>
> Andrew
  
Richard Sandiford Sept. 29, 2022, 9:24 a.m. UTC | #2
Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> On Wed, Sep 28, 2022 at 5:06 PM Andrew Stubbs <ams@codesourcery.com> wrote:
>>
>> This patch is a prerequisite for some amdgcn patches I'm working on to
>> support shorter vector lengths (having fixed 64 lanes tends to miss
>> optimizations, and masking is not supported everywhere yet).
>>
>> The problem is that, unlike AArch64, I'm not using different mask modes
>> for different sized vectors, so all loops end up using the while_ultsidi
>> pattern, regardless of vector length.  In theory I could use SImode for
>> V32, HImode for V16, etc., but there's no mode to fit V4 or V2 so
>> something else is needed.  Moving to using vector masks in the backend
>> is not a natural fit for GCN, and would be a huge task in any case.
>>
>> This patch adds an additional length operand so that we can distinguish
>> the different uses in the back end and don't end up with more lanes
>> enabled than there ought to be.
>>
>> I've made the extra operand conditional on the mode so that I don't have
>> to modify the AArch64 backend; that uses while_<cond> family of
>> operators in a lot of places and uses iterators, so it would end up
>> touching a lot of code just to add an inactive operand, plus I don't
>> have a way to test it properly.  I've confirmed that AArch64 builds and
>> expands while_ult correctly in a simple example.
>>
>> OK for mainline?
>
> Hmm, but you could introduce BI4mode and BI2mode for V4 and V2, no?
> Not sure if it is possible to have two partial integer modes and use those.

Might be difficult to do cleanly, since BI is very much a special case.
But I agree that that would better fit the existing scheme.

Otherwise:

  operand0[0] = operand1 < operand2;
  for (i = 1; i < operand3; i++)
    operand0[i] = operand0[i - 1] && (operand1 + i < operand2);

looks like a "length and mask" operation, which IIUC is also what
RVV wanted?  (Wasn't at the Cauldron, so not entirely sure.)

Perhaps the difference is that in this case the length must be constant.
(Or is that true for RVV as well?)

Thanks,
Richard
  
Richard Biener Sept. 29, 2022, 9:37 a.m. UTC | #3
On Thu, Sep 29, 2022 at 11:24 AM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > On Wed, Sep 28, 2022 at 5:06 PM Andrew Stubbs <ams@codesourcery.com> wrote:
> >>
> >> This patch is a prerequisite for some amdgcn patches I'm working on to
> >> support shorter vector lengths (having fixed 64 lanes tends to miss
> >> optimizations, and masking is not supported everywhere yet).
> >>
> >> The problem is that, unlike AArch64, I'm not using different mask modes
> >> for different sized vectors, so all loops end up using the while_ultsidi
> >> pattern, regardless of vector length.  In theory I could use SImode for
> >> V32, HImode for V16, etc., but there's no mode to fit V4 or V2 so
> >> something else is needed.  Moving to using vector masks in the backend
> >> is not a natural fit for GCN, and would be a huge task in any case.
> >>
> >> This patch adds an additional length operand so that we can distinguish
> >> the different uses in the back end and don't end up with more lanes
> >> enabled than there ought to be.
> >>
> >> I've made the extra operand conditional on the mode so that I don't have
> >> to modify the AArch64 backend; that uses while_<cond> family of
> >> operators in a lot of places and uses iterators, so it would end up
> >> touching a lot of code just to add an inactive operand, plus I don't
> >> have a way to test it properly.  I've confirmed that AArch64 builds and
> >> expands while_ult correctly in a simple example.
> >>
> >> OK for mainline?
> >
> > Hmm, but you could introduce BI4mode and BI2mode for V4 and V2, no?
> > Not sure if it is possible to have two partial integer modes and use those.
>
> Might be difficult to do cleanly, since BI is very much a special case.
> But I agree that that would better fit the existing scheme.
>
> Otherwise:
>
>   operand0[0] = operand1 < operand2;
>   for (i = 1; i < operand3; i++)
>     operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
>
> looks like a "length and mask" operation, which IIUC is also what
> RVV wanted?  (Wasn't at the Cauldron, so not entirely sure.)
>
> Perhaps the difference is that in this case the length must be constant.
> (Or is that true for RVV as well?)

I think the length is variable and queried at runtime but it might be also
used when compiling with a fixed length vector size.

Note x86 with its integer mode AVX512 masks runs into similar issues
but just uses QImode to DImode (but doesn't exercise this particular pattern).
It basically relies on the actual machine instructions not enabling the
particular lanes, like when doing a V2DFmode compare to produce a mask.
For the while_ult that's of course a bit hard to achieve (sadly AVX512 doesn't
have any such capability and my attempts to emulate have been either
unsuccessfully or slow)

Richard.

>
> Thanks,
> Richard
  
juzhe.zhong@rivai.ai Sept. 29, 2022, 9:38 a.m. UTC | #4
No, this is not RVV wanted.
There is the example that how RVV uses length and mask:
for (int i = 0; i < n; i++)
  if (cond[i] == 0)
    a[i] = b[i] + c[i];

The gimple IR should be this:

AVL = #number of element to be updated.
TVL  = #number of total element to be updated
vect_0 = vector of cond[i]
vect_1 = vector of a[i]
vect_2 = vector of b[i]
vect_3 = vector of c[i]

....
body:
AVL  = WHILE_LEN (TVL,.....) This is new pattern I add for RVV to calculate the active vector length, the pattern assume AVL <= TVL (always).

mask_0 = vect_0 == {0,0,0,....} comparison to generate mask for predicate.

vect_1 = len_cond_len (mask0, vect_1, vect_2, vect_3, AVL) 
This is also the new pattern, When predicate (mask bit = 1 && elment index < AVL) is 1, update vect_1 = vect_2 + vect_3. Otherwise vect_1 = vect_1)
......
TVL = TVL - AVL (decrease the counter, until it becomes 0 to exit the loop)
if (TVL ==0)
  exit loop
......


juzhe.zhong@rivai.ai
 
From: Richard Sandiford
Date: 2022-09-29 17:24
To: Richard Biener via Gcc-patches
CC: Andrew Stubbs; Richard Biener; juzhe.zhong
Subject: Re: [PATCH] vect: while_ult for integer mask
Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> On Wed, Sep 28, 2022 at 5:06 PM Andrew Stubbs <ams@codesourcery.com> wrote:
>>
>> This patch is a prerequisite for some amdgcn patches I'm working on to
>> support shorter vector lengths (having fixed 64 lanes tends to miss
>> optimizations, and masking is not supported everywhere yet).
>>
>> The problem is that, unlike AArch64, I'm not using different mask modes
>> for different sized vectors, so all loops end up using the while_ultsidi
>> pattern, regardless of vector length.  In theory I could use SImode for
>> V32, HImode for V16, etc., but there's no mode to fit V4 or V2 so
>> something else is needed.  Moving to using vector masks in the backend
>> is not a natural fit for GCN, and would be a huge task in any case.
>>
>> This patch adds an additional length operand so that we can distinguish
>> the different uses in the back end and don't end up with more lanes
>> enabled than there ought to be.
>>
>> I've made the extra operand conditional on the mode so that I don't have
>> to modify the AArch64 backend; that uses while_<cond> family of
>> operators in a lot of places and uses iterators, so it would end up
>> touching a lot of code just to add an inactive operand, plus I don't
>> have a way to test it properly.  I've confirmed that AArch64 builds and
>> expands while_ult correctly in a simple example.
>>
>> OK for mainline?
>
> Hmm, but you could introduce BI4mode and BI2mode for V4 and V2, no?
> Not sure if it is possible to have two partial integer modes and use those.
 
Might be difficult to do cleanly, since BI is very much a special case.
But I agree that that would better fit the existing scheme.
 
Otherwise:
 
  operand0[0] = operand1 < operand2;
  for (i = 1; i < operand3; i++)
    operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
 
looks like a "length and mask" operation, which IIUC is also what
RVV wanted?  (Wasn't at the Cauldron, so not entirely sure.)
 
Perhaps the difference is that in this case the length must be constant.
(Or is that true for RVV as well?)
 
Thanks,
Richard
  
Andrew Stubbs Sept. 29, 2022, 9:50 a.m. UTC | #5
On 29/09/2022 08:52, Richard Biener wrote:
> On Wed, Sep 28, 2022 at 5:06 PM Andrew Stubbs <ams@codesourcery.com> wrote:
>>
>> This patch is a prerequisite for some amdgcn patches I'm working on to
>> support shorter vector lengths (having fixed 64 lanes tends to miss
>> optimizations, and masking is not supported everywhere yet).
>>
>> The problem is that, unlike AArch64, I'm not using different mask modes
>> for different sized vectors, so all loops end up using the while_ultsidi
>> pattern, regardless of vector length.  In theory I could use SImode for
>> V32, HImode for V16, etc., but there's no mode to fit V4 or V2 so
>> something else is needed.  Moving to using vector masks in the backend
>> is not a natural fit for GCN, and would be a huge task in any case.
>>
>> This patch adds an additional length operand so that we can distinguish
>> the different uses in the back end and don't end up with more lanes
>> enabled than there ought to be.
>>
>> I've made the extra operand conditional on the mode so that I don't have
>> to modify the AArch64 backend; that uses while_<cond> family of
>> operators in a lot of places and uses iterators, so it would end up
>> touching a lot of code just to add an inactive operand, plus I don't
>> have a way to test it properly.  I've confirmed that AArch64 builds and
>> expands while_ult correctly in a simple example.
>>
>> OK for mainline?
> 
> Hmm, but you could introduce BI4mode and BI2mode for V4 and V2, no?
> Not sure if it is possible to have two partial integer modes and use those.

When we first tried to do this port we tried to use V64BImode for masks 
and got into a horrible mess.  DImode works much better.  In any case, 
at this point retrofitting new mask types into the back end would be a 
big job.

We also have the problem that the mask register is actually two 32-bit 
registers so if you try to use smaller modes the compiler ends up 
leaving the high part undefined and bad things happen. Basically, 
regardless of the notional size of the vector, the mask really is 
64-bit, and the high bits really do have to be well defined (to zero).

The problem is simply that while_ult has lost information in the 
lowering and expanding process. The size of the vector was clear in 
gimple, but lost in RTL.

Andrew
  
Andrew Stubbs Sept. 29, 2022, 9:56 a.m. UTC | #6
On 29/09/2022 10:24, Richard Sandiford wrote:
> Otherwise:
> 
>    operand0[0] = operand1 < operand2;
>    for (i = 1; i < operand3; i++)
>      operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
> 
> looks like a "length and mask" operation, which IIUC is also what
> RVV wanted?  (Wasn't at the Cauldron, so not entirely sure.)
> 
> Perhaps the difference is that in this case the length must be constant.
> (Or is that true for RVV as well?)

I too saw that presentation and I have compared notes with Juzhe before 
posting this.

As he has posted, what they want is different because their config 
register has an explicit length field whereas GCN just uses a mask to 
limit the length (more like AArch64, I think).

The RVV solution uses different logic in the gimple IR; this proposal is 
indistinguishable from the status quo at that point.

Andrew
  
Richard Sandiford Sept. 29, 2022, 10:17 a.m. UTC | #7
Andrew Stubbs <ams@codesourcery.com> writes:
> On 29/09/2022 10:24, Richard Sandiford wrote:
>> Otherwise:
>> 
>>    operand0[0] = operand1 < operand2;
>>    for (i = 1; i < operand3; i++)
>>      operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
>> 
>> looks like a "length and mask" operation, which IIUC is also what
>> RVV wanted?  (Wasn't at the Cauldron, so not entirely sure.)
>> 
>> Perhaps the difference is that in this case the length must be constant.
>> (Or is that true for RVV as well?)
>
> I too saw that presentation and I have compared notes with Juzhe before 
> posting this.
>
> As he has posted, what they want is different because their config 
> register has an explicit length field whereas GCN just uses a mask to 
> limit the length (more like AArch64, I think).
>
> The RVV solution uses different logic in the gimple IR; this proposal is 
> indistinguishable from the status quo at that point.

Hmm, OK.  (And thanks to Juzhe for the summary.)

I can't think of any existing examples of optabs that have a variable
number of operands.  But maybe this is a good reason to change that.

Having to add what amounts to a vector type descriptor to make up for
the lack of mode information seems like a bit of a hack.  But it's
possibly a hack that we'll need to do again (for other patterns),
if we keep representing multiple distinct vector/predicate types
using the same integer mode.  I guess this establishes a way of
coping with the situation in general.

So personally I'm OK with the patch, if Richi agrees.

Richard
  
Richard Biener Sept. 29, 2022, 1:46 p.m. UTC | #8
On Thu, Sep 29, 2022 at 12:17 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Andrew Stubbs <ams@codesourcery.com> writes:
> > On 29/09/2022 10:24, Richard Sandiford wrote:
> >> Otherwise:
> >>
> >>    operand0[0] = operand1 < operand2;
> >>    for (i = 1; i < operand3; i++)
> >>      operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
> >>
> >> looks like a "length and mask" operation, which IIUC is also what
> >> RVV wanted?  (Wasn't at the Cauldron, so not entirely sure.)
> >>
> >> Perhaps the difference is that in this case the length must be constant.
> >> (Or is that true for RVV as well?)
> >
> > I too saw that presentation and I have compared notes with Juzhe before
> > posting this.
> >
> > As he has posted, what they want is different because their config
> > register has an explicit length field whereas GCN just uses a mask to
> > limit the length (more like AArch64, I think).
> >
> > The RVV solution uses different logic in the gimple IR; this proposal is
> > indistinguishable from the status quo at that point.
>
> Hmm, OK.  (And thanks to Juzhe for the summary.)
>
> I can't think of any existing examples of optabs that have a variable
> number of operands.  But maybe this is a good reason to change that.
>
> Having to add what amounts to a vector type descriptor to make up for
> the lack of mode information seems like a bit of a hack.  But it's
> possibly a hack that we'll need to do again (for other patterns),
> if we keep representing multiple distinct vector/predicate types
> using the same integer mode.  I guess this establishes a way of
> coping with the situation in general.
>
> So personally I'm OK with the patch, if Richi agrees.

It's not the nicest way of carrying the information but short of inventing
new modes I can't see something better (well, another optab).  I see
the GCN backend expects a constant in operand 3 but the docs don't
specify the operand has to be a CONST_INT, can you adjust them
accordingly?

Otherwise I'm fine with it.  It might even prove useful for x86.

Richard.

> Richard
  
Andrew Stubbs Oct. 3, 2022, 2:27 p.m. UTC | #9
On 29/09/2022 14:46, Richard Biener wrote:
> It's not the nicest way of carrying the information but short of inventing
> new modes I can't see something better (well, another optab).  I see
> the GCN backend expects a constant in operand 3 but the docs don't
> specify the operand has to be a CONST_INT, can you adjust them
> accordingly?
> 
> Otherwise I'm fine with it.  It might even prove useful for x86.

Thank you.

Here's what I pushed.

Andrew
vect: while_ult for integer masks

Add a vector length parameter needed by amdgcn without breaking aarch64.

All amdgcn vector masks are DImode, regardless of vector length, so we can't
tell what length is implied simply from the operator mode.  (Even if we used
different integer modes there's no mode small enough to differenciate a 2 or
4 lane mask).  Without knowing the intended length we end up using a mask with
too many lanes enabled, which leads to undefined behaviour..

The extra operand is not added for vector mask types so AArch64 does not need
to be adjusted.

gcc/ChangeLog:

	* config/gcn/gcn-valu.md (while_ultsidi): Limit mask length using
	operand 3.
	* doc/md.texi (while_ult): Document new operand 3 usage.
	* internal-fn.cc (expand_while_optab_fn): Set operand 3 when lhs_type
	maps to a non-vector mode.

diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index 3bfdf8213fc..dec81e863f7 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -3052,7 +3052,8 @@ (define_expand "vcondu<V_ALL:mode><V_INT:mode>_exec"
 (define_expand "while_ultsidi"
   [(match_operand:DI 0 "register_operand")
    (match_operand:SI 1 "")
-   (match_operand:SI 2 "")]
+   (match_operand:SI 2 "")
+   (match_operand:SI 3 "")]
   ""
   {
     if (GET_CODE (operands[1]) != CONST_INT
@@ -3077,6 +3078,11 @@ (define_expand "while_ultsidi"
 			      : ~((unsigned HOST_WIDE_INT)-1 << diff));
 	emit_move_insn (operands[0], gen_rtx_CONST_INT (VOIDmode, mask));
       }
+    if (INTVAL (operands[3]) < 64)
+      emit_insn (gen_anddi3 (operands[0], operands[0],
+			     gen_rtx_CONST_INT (VOIDmode,
+						~((unsigned HOST_WIDE_INT)-1
+						  << INTVAL (operands[3])))));
     DONE;
   })
 
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index d46963f468c..bb42ee1da36 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -4950,9 +4950,10 @@ This pattern is not allowed to @code{FAIL}.
 @cindex @code{while_ult@var{m}@var{n}} instruction pattern
 @item @code{while_ult@var{m}@var{n}}
 Set operand 0 to a mask that is true while incrementing operand 1
-gives a value that is less than operand 2.  Operand 0 has mode @var{n}
-and operands 1 and 2 are scalar integers of mode @var{m}.
-The operation is equivalent to:
+gives a value that is less than operand 2, for a vector length up to operand 3.
+Operand 0 has mode @var{n} and operands 1 and 2 are scalar integers of mode
+@var{m}.  Operand 3 should be omitted when @var{n} is a vector mode, and
+a @code{CONST_INT} otherwise.  The operation for vector modes is equivalent to:
 
 @smallexample
 operand0[0] = operand1 < operand2;
@@ -4960,6 +4961,14 @@ for (i = 1; i < GET_MODE_NUNITS (@var{n}); i++)
   operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
 @end smallexample
 
+And for non-vector modes the operation is equivalent to:
+
+@smallexample
+operand0[0] = operand1 < operand2;
+for (i = 1; i < operand3; i++)
+  operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
+@end smallexample
+
 @cindex @code{check_raw_ptrs@var{m}} instruction pattern
 @item @samp{check_raw_ptrs@var{m}}
 Check whether, given two pointers @var{a} and @var{b} and a length @var{len},
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 651d99eaeb9..c306240c2ac 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -3664,7 +3664,7 @@ expand_direct_optab_fn (internal_fn fn, gcall *stmt, direct_optab optab,
 static void
 expand_while_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
 {
-  expand_operand ops[3];
+  expand_operand ops[4];
   tree rhs_type[2];
 
   tree lhs = gimple_call_lhs (stmt);
@@ -3680,10 +3680,24 @@ expand_while_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
       create_input_operand (&ops[i + 1], rhs_rtx, TYPE_MODE (rhs_type[i]));
     }
 
+  int opcnt;
+  if (!VECTOR_MODE_P (TYPE_MODE (lhs_type)))
+    {
+      /* When the mask is an integer mode the exact vector length may not
+	 be clear to the backend, so we pass it in operand[3].
+         Use the vector in arg2 for the most reliable intended size.  */
+      tree type = TREE_TYPE (gimple_call_arg (stmt, 2));
+      create_integer_operand (&ops[3], TYPE_VECTOR_SUBPARTS (type));
+      opcnt = 4;
+    }
+  else
+    /* The mask has a vector type so the length operand is unnecessary.  */
+    opcnt = 3;
+
   insn_code icode = convert_optab_handler (optab, TYPE_MODE (rhs_type[0]),
 					   TYPE_MODE (lhs_type));
 
-  expand_insn (icode, 3, ops);
+  expand_insn (icode, opcnt, ops);
   if (!rtx_equal_p (lhs_rtx, ops[0].value))
     emit_move_insn (lhs_rtx, ops[0].value);
 }
  

Patch

diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index 3bfdf8213fc..dec81e863f7 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -3052,7 +3052,8 @@  (define_expand "vcondu<V_ALL:mode><V_INT:mode>_exec"
 (define_expand "while_ultsidi"
   [(match_operand:DI 0 "register_operand")
    (match_operand:SI 1 "")
-   (match_operand:SI 2 "")]
+   (match_operand:SI 2 "")
+   (match_operand:SI 3 "")]
   ""
   {
     if (GET_CODE (operands[1]) != CONST_INT
@@ -3077,6 +3078,11 @@  (define_expand "while_ultsidi"
 			      : ~((unsigned HOST_WIDE_INT)-1 << diff));
 	emit_move_insn (operands[0], gen_rtx_CONST_INT (VOIDmode, mask));
       }
+    if (INTVAL (operands[3]) < 64)
+      emit_insn (gen_anddi3 (operands[0], operands[0],
+			     gen_rtx_CONST_INT (VOIDmode,
+						~((unsigned HOST_WIDE_INT)-1
+						  << INTVAL (operands[3])))));
     DONE;
   })
 
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index d46963f468c..d8e2a5a83f4 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -4950,9 +4950,10 @@  This pattern is not allowed to @code{FAIL}.
 @cindex @code{while_ult@var{m}@var{n}} instruction pattern
 @item @code{while_ult@var{m}@var{n}}
 Set operand 0 to a mask that is true while incrementing operand 1
-gives a value that is less than operand 2.  Operand 0 has mode @var{n}
-and operands 1 and 2 are scalar integers of mode @var{m}.
-The operation is equivalent to:
+gives a value that is less than operand 2, for a vector length up to operand 3.
+Operand 0 has mode @var{n} and operands 1 to 3 are scalar integers of mode
+@var{m}.  Operand 3 should be omitted when @var{n} is a vector mode.  The
+operation for vector modes is equivalent to:
 
 @smallexample
 operand0[0] = operand1 < operand2;
@@ -4960,6 +4961,14 @@  for (i = 1; i < GET_MODE_NUNITS (@var{n}); i++)
   operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
 @end smallexample
 
+And for non-vector modes the operation is equivalent to:
+
+@smallexample
+operand0[0] = operand1 < operand2;
+for (i = 1; i < operand3; i++)
+  operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
+@end smallexample
+
 @cindex @code{check_raw_ptrs@var{m}} instruction pattern
 @item @samp{check_raw_ptrs@var{m}}
 Check whether, given two pointers @var{a} and @var{b} and a length @var{len},
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 651d99eaeb9..c306240c2ac 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -3664,7 +3664,7 @@  expand_direct_optab_fn (internal_fn fn, gcall *stmt, direct_optab optab,
 static void
 expand_while_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
 {
-  expand_operand ops[3];
+  expand_operand ops[4];
   tree rhs_type[2];
 
   tree lhs = gimple_call_lhs (stmt);
@@ -3680,10 +3680,24 @@  expand_while_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
       create_input_operand (&ops[i + 1], rhs_rtx, TYPE_MODE (rhs_type[i]));
     }
 
+  int opcnt;
+  if (!VECTOR_MODE_P (TYPE_MODE (lhs_type)))
+    {
+      /* When the mask is an integer mode the exact vector length may not
+	 be clear to the backend, so we pass it in operand[3].
+         Use the vector in arg2 for the most reliable intended size.  */
+      tree type = TREE_TYPE (gimple_call_arg (stmt, 2));
+      create_integer_operand (&ops[3], TYPE_VECTOR_SUBPARTS (type));
+      opcnt = 4;
+    }
+  else
+    /* The mask has a vector type so the length operand is unnecessary.  */
+    opcnt = 3;
+
   insn_code icode = convert_optab_handler (optab, TYPE_MODE (rhs_type[0]),
 					   TYPE_MODE (lhs_type));
 
-  expand_insn (icode, 3, ops);
+  expand_insn (icode, opcnt, ops);
   if (!rtx_equal_p (lhs_rtx, ops[0].value))
     emit_move_insn (lhs_rtx, ops[0].value);
 }