[x86] PR target/110588: Add *bt<mode>_setncqi_2 to generate btl

Message ID 042301d9b5ac$f84e3e60$e8eabb20$@nextmovesoftware.com
State Accepted
Headers
Series [x86] PR target/110588: Add *bt<mode>_setncqi_2 to generate btl |

Checks

Context Check Description
snail/gcc-patch-check success Github commit url

Commit Message

Roger Sayle July 13, 2023, 5:10 p.m. UTC
  This patch resolves PR target/110588 to catch another case in combine
where the i386 backend should be generating a btl instruction.  This adds
another define_insn_and_split to recognize the RTL representation for this
case.

I also noticed that two related define_insn_and_split weren't using the
preferred string style for single statement preparation-statements, so
I've reformatted these to be consistent in style with the new one.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2023-07-13  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
        PR target/110588
        * config/i386/i386.md (*bt<mode>_setcqi): Prefer string form
        preparation statement over braces for a single statement.
        (*bt<mode>_setncqi): Likewise.
        (*bt<mode>_setncqi_2): New define_insn_and_split.

gcc/testsuite/ChangeLog
        PR target/110588
        * gcc.target/i386/pr110588.c: New test case.


Thanks again,
Roger
--
  

Comments

Uros Bizjak July 13, 2023, 6:21 p.m. UTC | #1
On Thu, Jul 13, 2023 at 7:10 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> This patch resolves PR target/110588 to catch another case in combine
> where the i386 backend should be generating a btl instruction.  This adds
> another define_insn_and_split to recognize the RTL representation for this
> case.
>
> I also noticed that two related define_insn_and_split weren't using the
> preferred string style for single statement preparation-statements, so
> I've reformatted these to be consistent in style with the new one.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2023-07-13  Roger Sayle  <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
>         PR target/110588
>         * config/i386/i386.md (*bt<mode>_setcqi): Prefer string form
>         preparation statement over braces for a single statement.
>         (*bt<mode>_setncqi): Likewise.
>         (*bt<mode>_setncqi_2): New define_insn_and_split.
>
> gcc/testsuite/ChangeLog
>         PR target/110588
>         * gcc.target/i386/pr110588.c: New test case.

+;; Help combine recognize bt followed by setnc (PR target/110588)
+(define_insn_and_split "*bt<mode>_setncqi_2"
+  [(set (match_operand:QI 0 "register_operand")
+ (eq:QI
+  (zero_extract:SWI48
+    (match_operand:SWI48 1 "register_operand")
+    (const_int 1)
+    (zero_extend:SI (match_operand:QI 2 "register_operand")))
+  (const_int 0)))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_USE_BT && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(set (reg:CCC FLAGS_REG)
+        (compare:CCC
+         (zero_extract:SWI48 (match_dup 1) (const_int 1) (match_dup 2))
+         (const_int 0)))
+   (set (match_dup 0)
+        (ne:QI (reg:CCC FLAGS_REG) (const_int 0)))]
+  "operands[2] = lowpart_subreg (SImode, operands[2], QImode);")

I don't think the above transformation is 100% correct, mainly due to
the use of paradoxical subreg.

The combined instruction is operating with a zero_extended QImode
register, so all bits of the register are well defined. You are
splitting using paradoxical subreg, so you don't know what garbage is
there in the highpart of the count register. However, BTL/BTQ uses
modulo 64 (or 32) of this register, so even with a slightly invalid
RTX, everything checks out.

+  "operands[2] = lowpart_subreg (SImode, operands[2], QImode);")

You probably need <SWI48>mode instead of SImode here.

Otherwise OK.

Thanks,
Uros.
  
Roger Sayle July 14, 2023, 9:27 a.m. UTC | #2
> From: Uros Bizjak <ubizjak@gmail.com>
> Sent: 13 July 2023 19:21
> 
> On Thu, Jul 13, 2023 at 7:10 PM Roger Sayle <roger@nextmovesoftware.com>
> wrote:
> >
> > This patch resolves PR target/110588 to catch another case in combine
> > where the i386 backend should be generating a btl instruction.  This
> > adds another define_insn_and_split to recognize the RTL representation
> > for this case.
> >
> > I also noticed that two related define_insn_and_split weren't using
> > the preferred string style for single statement
> > preparation-statements, so I've reformatted these to be consistent in style with
> the new one.
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32}
> > with no new failures.  Ok for mainline?
> >
> >
> > 2023-07-13  Roger Sayle  <roger@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> >         PR target/110588
> >         * config/i386/i386.md (*bt<mode>_setcqi): Prefer string form
> >         preparation statement over braces for a single statement.
> >         (*bt<mode>_setncqi): Likewise.
> >         (*bt<mode>_setncqi_2): New define_insn_and_split.
> >
> > gcc/testsuite/ChangeLog
> >         PR target/110588
> >         * gcc.target/i386/pr110588.c: New test case.
> 
> +;; Help combine recognize bt followed by setnc (PR target/110588)
> +(define_insn_and_split "*bt<mode>_setncqi_2"
> +  [(set (match_operand:QI 0 "register_operand")  (eq:QI
> +  (zero_extract:SWI48
> +    (match_operand:SWI48 1 "register_operand")
> +    (const_int 1)
> +    (zero_extend:SI (match_operand:QI 2 "register_operand")))
> +  (const_int 0)))
> +   (clobber (reg:CC FLAGS_REG))]
> +  "TARGET_USE_BT && ix86_pre_reload_split ()"
> +  "#"
> +  "&& 1"
> +  [(set (reg:CCC FLAGS_REG)
> +        (compare:CCC
> +         (zero_extract:SWI48 (match_dup 1) (const_int 1) (match_dup 2))
> +         (const_int 0)))
> +   (set (match_dup 0)
> +        (ne:QI (reg:CCC FLAGS_REG) (const_int 0)))]
> +  "operands[2] = lowpart_subreg (SImode, operands[2], QImode);")
> 
> I don't think the above transformation is 100% correct, mainly due to the use of
> paradoxical subreg.
> 
> The combined instruction is operating with a zero_extended QImode register, so
> all bits of the register are well defined. You are splitting using paradoxical subreg,
> so you don't know what garbage is there in the highpart of the count register.
> However, BTL/BTQ uses modulo 64 (or 32) of this register, so even with a slightly
> invalid RTX, everything checks out.
> 
> +  "operands[2] = lowpart_subreg (SImode, operands[2], QImode);")
> 
> You probably need <SWI48>mode instead of SImode here.

The define_insn for *bt is:

(define_insn "*bt<mode>"
  [(set (reg:CCC FLAGS_REG)
        (compare:CCC
          (zero_extract:SWI48
            (match_operand:SWI48 0 "nonimmediate_operand" "r,m")
            (const_int 1)
            (match_operand:SI 1 "nonmemory_operand" "r<S>,<S>"))
          (const_int 0)))]

So <SWI48> isn't appropriate here.

But now you've made me think about it, it's inconsistent that all of the shifts
and rotates in i386.md standardize on QImode for shift counts, but the bit test
instructions use SImode?  I think this explains where the paradoxical SUBREGs
come from, and in theory any_extend from QImode to SImode here could/should 
be handled/unnecessary.

Is it worth investigating a follow-up patch to convert all ZERO_EXTRACTs and
SIGN_EXTRACTs in i386.md to use QImode (instead of SImode)?

Thanks in advance,
Roger
--
  
Uros Bizjak July 14, 2023, 10:03 a.m. UTC | #3
On Fri, Jul 14, 2023 at 11:27 AM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> > From: Uros Bizjak <ubizjak@gmail.com>
> > Sent: 13 July 2023 19:21
> >
> > On Thu, Jul 13, 2023 at 7:10 PM Roger Sayle <roger@nextmovesoftware.com>
> > wrote:
> > >
> > > This patch resolves PR target/110588 to catch another case in combine
> > > where the i386 backend should be generating a btl instruction.  This
> > > adds another define_insn_and_split to recognize the RTL representation
> > > for this case.
> > >
> > > I also noticed that two related define_insn_and_split weren't using
> > > the preferred string style for single statement
> > > preparation-statements, so I've reformatted these to be consistent in style with
> > the new one.
> > >
> > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > > and make -k check, both with and without --target_board=unix{-m32}
> > > with no new failures.  Ok for mainline?
> > >
> > >
> > > 2023-07-13  Roger Sayle  <roger@nextmovesoftware.com>
> > >
> > > gcc/ChangeLog
> > >         PR target/110588
> > >         * config/i386/i386.md (*bt<mode>_setcqi): Prefer string form
> > >         preparation statement over braces for a single statement.
> > >         (*bt<mode>_setncqi): Likewise.
> > >         (*bt<mode>_setncqi_2): New define_insn_and_split.
> > >
> > > gcc/testsuite/ChangeLog
> > >         PR target/110588
> > >         * gcc.target/i386/pr110588.c: New test case.
> >
> > +;; Help combine recognize bt followed by setnc (PR target/110588)
> > +(define_insn_and_split "*bt<mode>_setncqi_2"
> > +  [(set (match_operand:QI 0 "register_operand")  (eq:QI
> > +  (zero_extract:SWI48
> > +    (match_operand:SWI48 1 "register_operand")
> > +    (const_int 1)
> > +    (zero_extend:SI (match_operand:QI 2 "register_operand")))
> > +  (const_int 0)))
> > +   (clobber (reg:CC FLAGS_REG))]
> > +  "TARGET_USE_BT && ix86_pre_reload_split ()"
> > +  "#"
> > +  "&& 1"
> > +  [(set (reg:CCC FLAGS_REG)
> > +        (compare:CCC
> > +         (zero_extract:SWI48 (match_dup 1) (const_int 1) (match_dup 2))
> > +         (const_int 0)))
> > +   (set (match_dup 0)
> > +        (ne:QI (reg:CCC FLAGS_REG) (const_int 0)))]
> > +  "operands[2] = lowpart_subreg (SImode, operands[2], QImode);")
> >
> > I don't think the above transformation is 100% correct, mainly due to the use of
> > paradoxical subreg.
> >
> > The combined instruction is operating with a zero_extended QImode register, so
> > all bits of the register are well defined. You are splitting using paradoxical subreg,
> > so you don't know what garbage is there in the highpart of the count register.
> > However, BTL/BTQ uses modulo 64 (or 32) of this register, so even with a slightly
> > invalid RTX, everything checks out.
> >
> > +  "operands[2] = lowpart_subreg (SImode, operands[2], QImode);")
> >
> > You probably need <SWI48>mode instead of SImode here.
>
> The define_insn for *bt is:
>
> (define_insn "*bt<mode>"
>   [(set (reg:CCC FLAGS_REG)
>         (compare:CCC
>           (zero_extract:SWI48
>             (match_operand:SWI48 0 "nonimmediate_operand" "r,m")
>             (const_int 1)
>             (match_operand:SI 1 "nonmemory_operand" "r<S>,<S>"))
>           (const_int 0)))]
>
> So <SWI48> isn't appropriate here.
>
> But now you've made me think about it, it's inconsistent that all of the shifts
> and rotates in i386.md standardize on QImode for shift counts, but the bit test
> instructions use SImode?  I think this explains where the paradoxical SUBREGs
> come from, and in theory any_extend from QImode to SImode here could/should
> be handled/unnecessary.
>
> Is it worth investigating a follow-up patch to convert all ZERO_EXTRACTs and
> SIGN_EXTRACTs in i386.md to use QImode (instead of SImode)?

IIRC, zero_extract was moved from modeless to a pattern with defined
mode a while ago. Perhaps SImode is just because of these ancient
times, and BT pattern was written that way to satisfy combine. I think
it is definitely worth investigating; perhaps some BT-related pattern
will become obsolete because of the change.

Uros.
  

Patch

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index e47ced1..04eca049 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -16170,9 +16170,7 @@ 
          (const_int 0)))
    (set (match_dup 0)
         (eq:QI (reg:CCC FLAGS_REG) (const_int 0)))]
-{
-  operands[2] = lowpart_subreg (SImode, operands[2], QImode);
-})
+  "operands[2] = lowpart_subreg (SImode, operands[2], QImode);")
 
 ;; Help combine recognize bt followed by setnc
 (define_insn_and_split "*bt<mode>_setncqi"
@@ -16193,9 +16191,7 @@ 
          (const_int 0)))
    (set (match_dup 0)
         (ne:QI (reg:CCC FLAGS_REG) (const_int 0)))]
-{
-  operands[2] = lowpart_subreg (SImode, operands[2], QImode);
-})
+  "operands[2] = lowpart_subreg (SImode, operands[2], QImode);")
 
 (define_insn_and_split "*bt<mode>_setnc<mode>"
   [(set (match_operand:SWI48 0 "register_operand")
@@ -16219,6 +16215,27 @@ 
   operands[2] = lowpart_subreg (SImode, operands[2], QImode);
   operands[3] = gen_reg_rtx (QImode);
 })
+
+;; Help combine recognize bt followed by setnc (PR target/110588)
+(define_insn_and_split "*bt<mode>_setncqi_2"
+  [(set (match_operand:QI 0 "register_operand")
+	(eq:QI
+	  (zero_extract:SWI48
+ 	    (match_operand:SWI48 1 "register_operand")
+	    (const_int 1)
+	    (zero_extend:SI (match_operand:QI 2 "register_operand")))
+	  (const_int 0)))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_USE_BT && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(set (reg:CCC FLAGS_REG)
+        (compare:CCC
+         (zero_extract:SWI48 (match_dup 1) (const_int 1) (match_dup 2))
+         (const_int 0)))
+   (set (match_dup 0)
+        (ne:QI (reg:CCC FLAGS_REG) (const_int 0)))]
+  "operands[2] = lowpart_subreg (SImode, operands[2], QImode);")
 
 ;; Store-flag instructions.
 
diff --git a/gcc/testsuite/gcc.target/i386/pr110588.c b/gcc/testsuite/gcc.target/i386/pr110588.c
new file mode 100644
index 0000000..4505c87
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr110588.c
@@ -0,0 +1,18 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune=core2" } */
+
+unsigned char foo (unsigned char x, int y)
+{
+  int _1 = (int) x;
+  int _2 = _1 >> y;
+  int _3 = _2 & 1;
+  unsigned char _8 = (unsigned char) _3;
+  unsigned char _6 = _8 ^ 1;
+  return _6;
+}
+
+/* { dg-final { scan-assembler "btl" } } */
+/* { dg-final { scan-assembler "setnc" } } */
+/* { dg-final { scan-assembler-not "sarl" } } */
+/* { dg-final { scan-assembler-not "andl" } } */
+/* { dg-final { scan-assembler-not "xorl" } } */