vect: while_ult for integer mask
Checks
Commit Message
This patch is a prerequisite for some amdgcn patches I'm working on to
support shorter vector lengths (having fixed 64 lanes tends to miss
optimizations, and masking is not supported everywhere yet).
The problem is that, unlike AArch64, I'm not using different mask modes
for different sized vectors, so all loops end up using the while_ultsidi
pattern, regardless of vector length. In theory I could use SImode for
V32, HImode for V16, etc., but there's no mode to fit V4 or V2 so
something else is needed. Moving to using vector masks in the backend
is not a natural fit for GCN, and would be a huge task in any case.
This patch adds an additional length operand so that we can distinguish
the different uses in the back end and don't end up with more lanes
enabled than there ought to be.
I've made the extra operand conditional on the mode so that I don't have
to modify the AArch64 backend; that uses while_<cond> family of
operators in a lot of places and uses iterators, so it would end up
touching a lot of code just to add an inactive operand, plus I don't
have a way to test it properly. I've confirmed that AArch64 builds and
expands while_ult correctly in a simple example.
OK for mainline?
Thanks
Andrew
vect: while_ult for integer masks
Add a vector length parameter needed by amdgcn without breaking aarch64.
All amdgcn vector masks are DImode, regardless of vector length, so we can't
tell what length is implied simply from the operator mode. (Even if we used
different integer modes there's no mode small enough to differenciate a 2 or
4 lane mask). Without knowing the intended length we end up using a mask with
too many lanes enabled, which leads to undefined behaviour..
The extra operand is not added for vector mask types so AArch64 does not need
to be adjusted.
gcc/ChangeLog:
* config/gcn/gcn-valu.md (while_ultsidi): Limit mask length using
operand 3.
* doc/md.texi (while_ult): Document new operand 3 usage.
* internal-fn.cc (expand_while_optab_fn): Set operand 3 when lhs_type
maps to a non-vector mode.
Comments
On Wed, Sep 28, 2022 at 5:06 PM Andrew Stubbs <ams@codesourcery.com> wrote:
>
> This patch is a prerequisite for some amdgcn patches I'm working on to
> support shorter vector lengths (having fixed 64 lanes tends to miss
> optimizations, and masking is not supported everywhere yet).
>
> The problem is that, unlike AArch64, I'm not using different mask modes
> for different sized vectors, so all loops end up using the while_ultsidi
> pattern, regardless of vector length. In theory I could use SImode for
> V32, HImode for V16, etc., but there's no mode to fit V4 or V2 so
> something else is needed. Moving to using vector masks in the backend
> is not a natural fit for GCN, and would be a huge task in any case.
>
> This patch adds an additional length operand so that we can distinguish
> the different uses in the back end and don't end up with more lanes
> enabled than there ought to be.
>
> I've made the extra operand conditional on the mode so that I don't have
> to modify the AArch64 backend; that uses while_<cond> family of
> operators in a lot of places and uses iterators, so it would end up
> touching a lot of code just to add an inactive operand, plus I don't
> have a way to test it properly. I've confirmed that AArch64 builds and
> expands while_ult correctly in a simple example.
>
> OK for mainline?
Hmm, but you could introduce BI4mode and BI2mode for V4 and V2, no?
Not sure if it is possible to have two partial integer modes and use those.
Richard.
>
> Thanks
>
> Andrew
Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> On Wed, Sep 28, 2022 at 5:06 PM Andrew Stubbs <ams@codesourcery.com> wrote:
>>
>> This patch is a prerequisite for some amdgcn patches I'm working on to
>> support shorter vector lengths (having fixed 64 lanes tends to miss
>> optimizations, and masking is not supported everywhere yet).
>>
>> The problem is that, unlike AArch64, I'm not using different mask modes
>> for different sized vectors, so all loops end up using the while_ultsidi
>> pattern, regardless of vector length. In theory I could use SImode for
>> V32, HImode for V16, etc., but there's no mode to fit V4 or V2 so
>> something else is needed. Moving to using vector masks in the backend
>> is not a natural fit for GCN, and would be a huge task in any case.
>>
>> This patch adds an additional length operand so that we can distinguish
>> the different uses in the back end and don't end up with more lanes
>> enabled than there ought to be.
>>
>> I've made the extra operand conditional on the mode so that I don't have
>> to modify the AArch64 backend; that uses while_<cond> family of
>> operators in a lot of places and uses iterators, so it would end up
>> touching a lot of code just to add an inactive operand, plus I don't
>> have a way to test it properly. I've confirmed that AArch64 builds and
>> expands while_ult correctly in a simple example.
>>
>> OK for mainline?
>
> Hmm, but you could introduce BI4mode and BI2mode for V4 and V2, no?
> Not sure if it is possible to have two partial integer modes and use those.
Might be difficult to do cleanly, since BI is very much a special case.
But I agree that that would better fit the existing scheme.
Otherwise:
operand0[0] = operand1 < operand2;
for (i = 1; i < operand3; i++)
operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
looks like a "length and mask" operation, which IIUC is also what
RVV wanted? (Wasn't at the Cauldron, so not entirely sure.)
Perhaps the difference is that in this case the length must be constant.
(Or is that true for RVV as well?)
Thanks,
Richard
On Thu, Sep 29, 2022 at 11:24 AM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > On Wed, Sep 28, 2022 at 5:06 PM Andrew Stubbs <ams@codesourcery.com> wrote:
> >>
> >> This patch is a prerequisite for some amdgcn patches I'm working on to
> >> support shorter vector lengths (having fixed 64 lanes tends to miss
> >> optimizations, and masking is not supported everywhere yet).
> >>
> >> The problem is that, unlike AArch64, I'm not using different mask modes
> >> for different sized vectors, so all loops end up using the while_ultsidi
> >> pattern, regardless of vector length. In theory I could use SImode for
> >> V32, HImode for V16, etc., but there's no mode to fit V4 or V2 so
> >> something else is needed. Moving to using vector masks in the backend
> >> is not a natural fit for GCN, and would be a huge task in any case.
> >>
> >> This patch adds an additional length operand so that we can distinguish
> >> the different uses in the back end and don't end up with more lanes
> >> enabled than there ought to be.
> >>
> >> I've made the extra operand conditional on the mode so that I don't have
> >> to modify the AArch64 backend; that uses while_<cond> family of
> >> operators in a lot of places and uses iterators, so it would end up
> >> touching a lot of code just to add an inactive operand, plus I don't
> >> have a way to test it properly. I've confirmed that AArch64 builds and
> >> expands while_ult correctly in a simple example.
> >>
> >> OK for mainline?
> >
> > Hmm, but you could introduce BI4mode and BI2mode for V4 and V2, no?
> > Not sure if it is possible to have two partial integer modes and use those.
>
> Might be difficult to do cleanly, since BI is very much a special case.
> But I agree that that would better fit the existing scheme.
>
> Otherwise:
>
> operand0[0] = operand1 < operand2;
> for (i = 1; i < operand3; i++)
> operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
>
> looks like a "length and mask" operation, which IIUC is also what
> RVV wanted? (Wasn't at the Cauldron, so not entirely sure.)
>
> Perhaps the difference is that in this case the length must be constant.
> (Or is that true for RVV as well?)
I think the length is variable and queried at runtime but it might be also
used when compiling with a fixed length vector size.
Note x86 with its integer mode AVX512 masks runs into similar issues
but just uses QImode to DImode (but doesn't exercise this particular pattern).
It basically relies on the actual machine instructions not enabling the
particular lanes, like when doing a V2DFmode compare to produce a mask.
For the while_ult that's of course a bit hard to achieve (sadly AVX512 doesn't
have any such capability and my attempts to emulate have been either
unsuccessfully or slow)
Richard.
>
> Thanks,
> Richard
No, this is not RVV wanted.
There is the example that how RVV uses length and mask:
for (int i = 0; i < n; i++)
if (cond[i] == 0)
a[i] = b[i] + c[i];
The gimple IR should be this:
AVL = #number of element to be updated.
TVL = #number of total element to be updated
vect_0 = vector of cond[i]
vect_1 = vector of a[i]
vect_2 = vector of b[i]
vect_3 = vector of c[i]
....
body:
AVL = WHILE_LEN (TVL,.....) This is new pattern I add for RVV to calculate the active vector length, the pattern assume AVL <= TVL (always).
mask_0 = vect_0 == {0,0,0,....} comparison to generate mask for predicate.
vect_1 = len_cond_len (mask0, vect_1, vect_2, vect_3, AVL)
This is also the new pattern, When predicate (mask bit = 1 && elment index < AVL) is 1, update vect_1 = vect_2 + vect_3. Otherwise vect_1 = vect_1)
......
TVL = TVL - AVL (decrease the counter, until it becomes 0 to exit the loop)
if (TVL ==0)
exit loop
......
juzhe.zhong@rivai.ai
From: Richard Sandiford
Date: 2022-09-29 17:24
To: Richard Biener via Gcc-patches
CC: Andrew Stubbs; Richard Biener; juzhe.zhong
Subject: Re: [PATCH] vect: while_ult for integer mask
Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> On Wed, Sep 28, 2022 at 5:06 PM Andrew Stubbs <ams@codesourcery.com> wrote:
>>
>> This patch is a prerequisite for some amdgcn patches I'm working on to
>> support shorter vector lengths (having fixed 64 lanes tends to miss
>> optimizations, and masking is not supported everywhere yet).
>>
>> The problem is that, unlike AArch64, I'm not using different mask modes
>> for different sized vectors, so all loops end up using the while_ultsidi
>> pattern, regardless of vector length. In theory I could use SImode for
>> V32, HImode for V16, etc., but there's no mode to fit V4 or V2 so
>> something else is needed. Moving to using vector masks in the backend
>> is not a natural fit for GCN, and would be a huge task in any case.
>>
>> This patch adds an additional length operand so that we can distinguish
>> the different uses in the back end and don't end up with more lanes
>> enabled than there ought to be.
>>
>> I've made the extra operand conditional on the mode so that I don't have
>> to modify the AArch64 backend; that uses while_<cond> family of
>> operators in a lot of places and uses iterators, so it would end up
>> touching a lot of code just to add an inactive operand, plus I don't
>> have a way to test it properly. I've confirmed that AArch64 builds and
>> expands while_ult correctly in a simple example.
>>
>> OK for mainline?
>
> Hmm, but you could introduce BI4mode and BI2mode for V4 and V2, no?
> Not sure if it is possible to have two partial integer modes and use those.
Might be difficult to do cleanly, since BI is very much a special case.
But I agree that that would better fit the existing scheme.
Otherwise:
operand0[0] = operand1 < operand2;
for (i = 1; i < operand3; i++)
operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
looks like a "length and mask" operation, which IIUC is also what
RVV wanted? (Wasn't at the Cauldron, so not entirely sure.)
Perhaps the difference is that in this case the length must be constant.
(Or is that true for RVV as well?)
Thanks,
Richard
On 29/09/2022 08:52, Richard Biener wrote:
> On Wed, Sep 28, 2022 at 5:06 PM Andrew Stubbs <ams@codesourcery.com> wrote:
>>
>> This patch is a prerequisite for some amdgcn patches I'm working on to
>> support shorter vector lengths (having fixed 64 lanes tends to miss
>> optimizations, and masking is not supported everywhere yet).
>>
>> The problem is that, unlike AArch64, I'm not using different mask modes
>> for different sized vectors, so all loops end up using the while_ultsidi
>> pattern, regardless of vector length. In theory I could use SImode for
>> V32, HImode for V16, etc., but there's no mode to fit V4 or V2 so
>> something else is needed. Moving to using vector masks in the backend
>> is not a natural fit for GCN, and would be a huge task in any case.
>>
>> This patch adds an additional length operand so that we can distinguish
>> the different uses in the back end and don't end up with more lanes
>> enabled than there ought to be.
>>
>> I've made the extra operand conditional on the mode so that I don't have
>> to modify the AArch64 backend; that uses while_<cond> family of
>> operators in a lot of places and uses iterators, so it would end up
>> touching a lot of code just to add an inactive operand, plus I don't
>> have a way to test it properly. I've confirmed that AArch64 builds and
>> expands while_ult correctly in a simple example.
>>
>> OK for mainline?
>
> Hmm, but you could introduce BI4mode and BI2mode for V4 and V2, no?
> Not sure if it is possible to have two partial integer modes and use those.
When we first tried to do this port we tried to use V64BImode for masks
and got into a horrible mess. DImode works much better. In any case,
at this point retrofitting new mask types into the back end would be a
big job.
We also have the problem that the mask register is actually two 32-bit
registers so if you try to use smaller modes the compiler ends up
leaving the high part undefined and bad things happen. Basically,
regardless of the notional size of the vector, the mask really is
64-bit, and the high bits really do have to be well defined (to zero).
The problem is simply that while_ult has lost information in the
lowering and expanding process. The size of the vector was clear in
gimple, but lost in RTL.
Andrew
On 29/09/2022 10:24, Richard Sandiford wrote:
> Otherwise:
>
> operand0[0] = operand1 < operand2;
> for (i = 1; i < operand3; i++)
> operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
>
> looks like a "length and mask" operation, which IIUC is also what
> RVV wanted? (Wasn't at the Cauldron, so not entirely sure.)
>
> Perhaps the difference is that in this case the length must be constant.
> (Or is that true for RVV as well?)
I too saw that presentation and I have compared notes with Juzhe before
posting this.
As he has posted, what they want is different because their config
register has an explicit length field whereas GCN just uses a mask to
limit the length (more like AArch64, I think).
The RVV solution uses different logic in the gimple IR; this proposal is
indistinguishable from the status quo at that point.
Andrew
Andrew Stubbs <ams@codesourcery.com> writes:
> On 29/09/2022 10:24, Richard Sandiford wrote:
>> Otherwise:
>>
>> operand0[0] = operand1 < operand2;
>> for (i = 1; i < operand3; i++)
>> operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
>>
>> looks like a "length and mask" operation, which IIUC is also what
>> RVV wanted? (Wasn't at the Cauldron, so not entirely sure.)
>>
>> Perhaps the difference is that in this case the length must be constant.
>> (Or is that true for RVV as well?)
>
> I too saw that presentation and I have compared notes with Juzhe before
> posting this.
>
> As he has posted, what they want is different because their config
> register has an explicit length field whereas GCN just uses a mask to
> limit the length (more like AArch64, I think).
>
> The RVV solution uses different logic in the gimple IR; this proposal is
> indistinguishable from the status quo at that point.
Hmm, OK. (And thanks to Juzhe for the summary.)
I can't think of any existing examples of optabs that have a variable
number of operands. But maybe this is a good reason to change that.
Having to add what amounts to a vector type descriptor to make up for
the lack of mode information seems like a bit of a hack. But it's
possibly a hack that we'll need to do again (for other patterns),
if we keep representing multiple distinct vector/predicate types
using the same integer mode. I guess this establishes a way of
coping with the situation in general.
So personally I'm OK with the patch, if Richi agrees.
Richard
On Thu, Sep 29, 2022 at 12:17 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Andrew Stubbs <ams@codesourcery.com> writes:
> > On 29/09/2022 10:24, Richard Sandiford wrote:
> >> Otherwise:
> >>
> >> operand0[0] = operand1 < operand2;
> >> for (i = 1; i < operand3; i++)
> >> operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
> >>
> >> looks like a "length and mask" operation, which IIUC is also what
> >> RVV wanted? (Wasn't at the Cauldron, so not entirely sure.)
> >>
> >> Perhaps the difference is that in this case the length must be constant.
> >> (Or is that true for RVV as well?)
> >
> > I too saw that presentation and I have compared notes with Juzhe before
> > posting this.
> >
> > As he has posted, what they want is different because their config
> > register has an explicit length field whereas GCN just uses a mask to
> > limit the length (more like AArch64, I think).
> >
> > The RVV solution uses different logic in the gimple IR; this proposal is
> > indistinguishable from the status quo at that point.
>
> Hmm, OK. (And thanks to Juzhe for the summary.)
>
> I can't think of any existing examples of optabs that have a variable
> number of operands. But maybe this is a good reason to change that.
>
> Having to add what amounts to a vector type descriptor to make up for
> the lack of mode information seems like a bit of a hack. But it's
> possibly a hack that we'll need to do again (for other patterns),
> if we keep representing multiple distinct vector/predicate types
> using the same integer mode. I guess this establishes a way of
> coping with the situation in general.
>
> So personally I'm OK with the patch, if Richi agrees.
It's not the nicest way of carrying the information but short of inventing
new modes I can't see something better (well, another optab). I see
the GCN backend expects a constant in operand 3 but the docs don't
specify the operand has to be a CONST_INT, can you adjust them
accordingly?
Otherwise I'm fine with it. It might even prove useful for x86.
Richard.
> Richard
On 29/09/2022 14:46, Richard Biener wrote:
> It's not the nicest way of carrying the information but short of inventing
> new modes I can't see something better (well, another optab). I see
> the GCN backend expects a constant in operand 3 but the docs don't
> specify the operand has to be a CONST_INT, can you adjust them
> accordingly?
>
> Otherwise I'm fine with it. It might even prove useful for x86.
Thank you.
Here's what I pushed.
Andrew
vect: while_ult for integer masks
Add a vector length parameter needed by amdgcn without breaking aarch64.
All amdgcn vector masks are DImode, regardless of vector length, so we can't
tell what length is implied simply from the operator mode. (Even if we used
different integer modes there's no mode small enough to differenciate a 2 or
4 lane mask). Without knowing the intended length we end up using a mask with
too many lanes enabled, which leads to undefined behaviour..
The extra operand is not added for vector mask types so AArch64 does not need
to be adjusted.
gcc/ChangeLog:
* config/gcn/gcn-valu.md (while_ultsidi): Limit mask length using
operand 3.
* doc/md.texi (while_ult): Document new operand 3 usage.
* internal-fn.cc (expand_while_optab_fn): Set operand 3 when lhs_type
maps to a non-vector mode.
diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index 3bfdf8213fc..dec81e863f7 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -3052,7 +3052,8 @@ (define_expand "vcondu<V_ALL:mode><V_INT:mode>_exec"
(define_expand "while_ultsidi"
[(match_operand:DI 0 "register_operand")
(match_operand:SI 1 "")
- (match_operand:SI 2 "")]
+ (match_operand:SI 2 "")
+ (match_operand:SI 3 "")]
""
{
if (GET_CODE (operands[1]) != CONST_INT
@@ -3077,6 +3078,11 @@ (define_expand "while_ultsidi"
: ~((unsigned HOST_WIDE_INT)-1 << diff));
emit_move_insn (operands[0], gen_rtx_CONST_INT (VOIDmode, mask));
}
+ if (INTVAL (operands[3]) < 64)
+ emit_insn (gen_anddi3 (operands[0], operands[0],
+ gen_rtx_CONST_INT (VOIDmode,
+ ~((unsigned HOST_WIDE_INT)-1
+ << INTVAL (operands[3])))));
DONE;
})
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index d46963f468c..bb42ee1da36 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -4950,9 +4950,10 @@ This pattern is not allowed to @code{FAIL}.
@cindex @code{while_ult@var{m}@var{n}} instruction pattern
@item @code{while_ult@var{m}@var{n}}
Set operand 0 to a mask that is true while incrementing operand 1
-gives a value that is less than operand 2. Operand 0 has mode @var{n}
-and operands 1 and 2 are scalar integers of mode @var{m}.
-The operation is equivalent to:
+gives a value that is less than operand 2, for a vector length up to operand 3.
+Operand 0 has mode @var{n} and operands 1 and 2 are scalar integers of mode
+@var{m}. Operand 3 should be omitted when @var{n} is a vector mode, and
+a @code{CONST_INT} otherwise. The operation for vector modes is equivalent to:
@smallexample
operand0[0] = operand1 < operand2;
@@ -4960,6 +4961,14 @@ for (i = 1; i < GET_MODE_NUNITS (@var{n}); i++)
operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
@end smallexample
+And for non-vector modes the operation is equivalent to:
+
+@smallexample
+operand0[0] = operand1 < operand2;
+for (i = 1; i < operand3; i++)
+ operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
+@end smallexample
+
@cindex @code{check_raw_ptrs@var{m}} instruction pattern
@item @samp{check_raw_ptrs@var{m}}
Check whether, given two pointers @var{a} and @var{b} and a length @var{len},
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 651d99eaeb9..c306240c2ac 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -3664,7 +3664,7 @@ expand_direct_optab_fn (internal_fn fn, gcall *stmt, direct_optab optab,
static void
expand_while_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
{
- expand_operand ops[3];
+ expand_operand ops[4];
tree rhs_type[2];
tree lhs = gimple_call_lhs (stmt);
@@ -3680,10 +3680,24 @@ expand_while_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
create_input_operand (&ops[i + 1], rhs_rtx, TYPE_MODE (rhs_type[i]));
}
+ int opcnt;
+ if (!VECTOR_MODE_P (TYPE_MODE (lhs_type)))
+ {
+ /* When the mask is an integer mode the exact vector length may not
+ be clear to the backend, so we pass it in operand[3].
+ Use the vector in arg2 for the most reliable intended size. */
+ tree type = TREE_TYPE (gimple_call_arg (stmt, 2));
+ create_integer_operand (&ops[3], TYPE_VECTOR_SUBPARTS (type));
+ opcnt = 4;
+ }
+ else
+ /* The mask has a vector type so the length operand is unnecessary. */
+ opcnt = 3;
+
insn_code icode = convert_optab_handler (optab, TYPE_MODE (rhs_type[0]),
TYPE_MODE (lhs_type));
- expand_insn (icode, 3, ops);
+ expand_insn (icode, opcnt, ops);
if (!rtx_equal_p (lhs_rtx, ops[0].value))
emit_move_insn (lhs_rtx, ops[0].value);
}
@@ -3052,7 +3052,8 @@ (define_expand "vcondu<V_ALL:mode><V_INT:mode>_exec"
(define_expand "while_ultsidi"
[(match_operand:DI 0 "register_operand")
(match_operand:SI 1 "")
- (match_operand:SI 2 "")]
+ (match_operand:SI 2 "")
+ (match_operand:SI 3 "")]
""
{
if (GET_CODE (operands[1]) != CONST_INT
@@ -3077,6 +3078,11 @@ (define_expand "while_ultsidi"
: ~((unsigned HOST_WIDE_INT)-1 << diff));
emit_move_insn (operands[0], gen_rtx_CONST_INT (VOIDmode, mask));
}
+ if (INTVAL (operands[3]) < 64)
+ emit_insn (gen_anddi3 (operands[0], operands[0],
+ gen_rtx_CONST_INT (VOIDmode,
+ ~((unsigned HOST_WIDE_INT)-1
+ << INTVAL (operands[3])))));
DONE;
})
@@ -4950,9 +4950,10 @@ This pattern is not allowed to @code{FAIL}.
@cindex @code{while_ult@var{m}@var{n}} instruction pattern
@item @code{while_ult@var{m}@var{n}}
Set operand 0 to a mask that is true while incrementing operand 1
-gives a value that is less than operand 2. Operand 0 has mode @var{n}
-and operands 1 and 2 are scalar integers of mode @var{m}.
-The operation is equivalent to:
+gives a value that is less than operand 2, for a vector length up to operand 3.
+Operand 0 has mode @var{n} and operands 1 to 3 are scalar integers of mode
+@var{m}. Operand 3 should be omitted when @var{n} is a vector mode. The
+operation for vector modes is equivalent to:
@smallexample
operand0[0] = operand1 < operand2;
@@ -4960,6 +4961,14 @@ for (i = 1; i < GET_MODE_NUNITS (@var{n}); i++)
operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
@end smallexample
+And for non-vector modes the operation is equivalent to:
+
+@smallexample
+operand0[0] = operand1 < operand2;
+for (i = 1; i < operand3; i++)
+ operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
+@end smallexample
+
@cindex @code{check_raw_ptrs@var{m}} instruction pattern
@item @samp{check_raw_ptrs@var{m}}
Check whether, given two pointers @var{a} and @var{b} and a length @var{len},
@@ -3664,7 +3664,7 @@ expand_direct_optab_fn (internal_fn fn, gcall *stmt, direct_optab optab,
static void
expand_while_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
{
- expand_operand ops[3];
+ expand_operand ops[4];
tree rhs_type[2];
tree lhs = gimple_call_lhs (stmt);
@@ -3680,10 +3680,24 @@ expand_while_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
create_input_operand (&ops[i + 1], rhs_rtx, TYPE_MODE (rhs_type[i]));
}
+ int opcnt;
+ if (!VECTOR_MODE_P (TYPE_MODE (lhs_type)))
+ {
+ /* When the mask is an integer mode the exact vector length may not
+ be clear to the backend, so we pass it in operand[3].
+ Use the vector in arg2 for the most reliable intended size. */
+ tree type = TREE_TYPE (gimple_call_arg (stmt, 2));
+ create_integer_operand (&ops[3], TYPE_VECTOR_SUBPARTS (type));
+ opcnt = 4;
+ }
+ else
+ /* The mask has a vector type so the length operand is unnecessary. */
+ opcnt = 3;
+
insn_code icode = convert_optab_handler (optab, TYPE_MODE (rhs_type[0]),
TYPE_MODE (lhs_type));
- expand_insn (icode, 3, ops);
+ expand_insn (icode, opcnt, ops);
if (!rtx_equal_p (lhs_rtx, ops[0].value))
emit_move_insn (lhs_rtx, ops[0].value);
}