Message ID | 20220719060736.18399-1-hongtao.liu@intel.com |
---|---|
State | New, archived |
Headers |
Return-Path: <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a98:d5ce:0:b0:178:cc93:bf7d with SMTP id g14csp2260599eik; Mon, 18 Jul 2022 23:08:34 -0700 (PDT) X-Google-Smtp-Source: AGRyM1urg8eXLffZFZbndWym52I09n0Dpi+moUcPygMpQ8vxDhGgrlySwxR41kaIy9JznsNBjUZa X-Received: by 2002:a05:6402:150d:b0:43a:2cac:ca24 with SMTP id f13-20020a056402150d00b0043a2cacca24mr41797739edw.110.1658210914398; Mon, 18 Jul 2022 23:08:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1658210914; cv=none; d=google.com; s=arc-20160816; b=oFj78NlVtxacgsU0xIQrPhFb93KhyGOuSDbkchCR6l6xM9upTN+nAKDgD0jzxA/Mc5 v2dwsyVKBKkSZB77YefrESMMKNHs3umtPmizWqoFYL1HNN/KtBhJyhV6UZMUK3eY9+fT 78uVDzVEbZIBPCRBlluzPBvgPsDT1NzD4KNdZRGj8rfZick5VJJUIioe9bmyDRGznMy4 qhaCOZD2GifkkzbEpthDO0kRaIwOvuJv6XvzCtAy71wJbJrI82O52jNNJdDZXc7M3hsI gKrV200JhYvrC+EhVb/bF4FaVQYtN6fobEdR4T92eU1b3ZaL5fpB7Nb4ACXBfTHSV2dj tN4w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence:references :in-reply-to:message-id:date:subject:to:dmarc-filter:delivered-to :dkim-signature:dkim-filter; bh=5tS3u76xxafRjV8k7jhpjP4MbCp66NApyDkF7WhOrT4=; b=ZLns58CO0Yb5pG7msEO9PPuI0LE5Gn3LDO+wAvJeHSz4eQvR8YS4kj5clyQb4r0a4I emDRijLSm4DSzFcl9yjR9B3Z22KmRM6iea9sLwPfYQt4LwTxGgEFoklFRDmkBsuQJ+8y N4JHML0FR37Qnp0vmjDcZc/H/ja3NXWHwDXeZ5Z4/pBAC9ENY9CheIotsugDeX1qvHqf +c3yqBw3Uf/9EpjgnmAQTm3eEa03Yt8oRLQMXu4kies8Yc5GiT3RZYusQBboY8bxeyYS gnN+z1ALjng36DqeUt9gWdm81NtUv9l/iNGNNlRBRdAjVwQR6iqxYGh/mBExAk6QiX3d r3tA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=cSmyyICr; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id sb18-20020a1709076d9200b0072b9ca954b2si19190724ejc.1006.2022.07.18.23.08.34 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Jul 2022 23:08:34 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=cSmyyICr; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 61B973857410 for <ouuuleilei@gmail.com>; Tue, 19 Jul 2022 06:08:33 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 61B973857410 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1658210913; bh=5tS3u76xxafRjV8k7jhpjP4MbCp66NApyDkF7WhOrT4=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=cSmyyICrnp+mUsMVa2Z+pPy+GuDRji2ZeYaj5OgjdAoD0jiTHqbotKfRvZHb2iyLW kfedm3DLWiqTlvUewDJY1HVHIOn4UHuaVF8WPKHTnFEW8GzXLJVRPtL2lsvfkAYoeI dMI2gkof4JF15MjCaXoyi0+NAHpLqQNVRVjgg2zA= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by sourceware.org (Postfix) with ESMTPS id 77F433858428 for <gcc-patches@gcc.gnu.org>; Tue, 19 Jul 2022 06:07:49 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 77F433858428 X-IronPort-AV: E=McAfee;i="6400,9594,10412"; a="286418235" X-IronPort-AV: E=Sophos;i="5.92,283,1650956400"; d="scan'208";a="286418235" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Jul 2022 23:07:39 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,283,1650956400"; d="scan'208";a="665298245" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by fmsmga004.fm.intel.com with ESMTP; 18 Jul 2022 23:07:38 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 9D3F510056A5; Tue, 19 Jul 2022 14:07:37 +0800 (CST) To: gcc-patches@gcc.gnu.org Subject: [PATCH V2] Extend 16/32-bit vector bit_op patterns with (m, 0, i) alternative. Date: Tue, 19 Jul 2022 14:07:36 +0800 Message-Id: <20220719060736.18399-1-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <CAFULd4Zha2HMY4d5qUz9rL-4a8Nd6J5s4XOiSe4yETu2z1xCBg@mail.gmail.com> References: <CAFULd4Zha2HMY4d5qUz9rL-4a8Nd6J5s4XOiSe4yETu2z1xCBg@mail.gmail.com> X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> From: liuhongt via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: liuhongt <hongtao.liu@intel.com> Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org> X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1738760168063226588?= X-GMAIL-MSGID: =?utf-8?q?1738760168063226588?= |
Series |
[V2] Extend 16/32-bit vector bit_op patterns with (m, 0, i) alternative.
|
|
Commit Message
Li, Pan2 via Gcc-patches
July 19, 2022, 6:07 a.m. UTC
And split it after reload. > You will need ix86_binary_operator_ok insn constraint here with > corresponding expander using ix86_fixup_binary_operands_no_copy to > prepare insn operands. Split define_expand with just register_operand, and allow memory/immediate in define_insn, assume combine/forwprop will do optimization. > Please use if (!register_operand (operands[2], <MODE>mode)) instead. Changed. Update patch. gcc/ChangeLog: PR target/106038 * config/i386/mmx.md (<code><mode>3): New define_expand, it's original "<code><mode>3". (*<code><mode>3): New define_insn, it's original "<code><mode>3" be extended to handle memory and immediate operand with ix86_binary_operator_ok. Also adjust define_split after it. (mmxinsnmode): New mode attribute. (*mov<mode>_imm): Refactor with mmxinsnmode. * config/i386/predicates.md (register_or_x86_64_const_vector_operand): New predicate. gcc/testsuite/ChangeLog: * gcc.target/i386/pr106038-1.c: New test. --- gcc/config/i386/mmx.md | 71 ++++++++++++---------- gcc/config/i386/predicates.md | 4 ++ gcc/testsuite/gcc.target/i386/pr106038-1.c | 27 ++++++++ 3 files changed, 71 insertions(+), 31 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr106038-1.c
Comments
On Tue, Jul 19, 2022 at 8:07 AM liuhongt <hongtao.liu@intel.com> wrote: > > And split it after reload. > > > You will need ix86_binary_operator_ok insn constraint here with > > corresponding expander using ix86_fixup_binary_operands_no_copy to > > prepare insn operands. > Split define_expand with just register_operand, and allow > memory/immediate in define_insn, assume combine/forwprop will do optimization. But you will *ease* the job of the above passes if you use ix86_fixup_binary_operands_no_copy in the expander. You can already use a nonimmediate operand for operands 0 and 1 in the expander (and register_or_x86_64_const_vector_operand as operand 2) and the fixup will try to optimize and adjust the pattern. So I really don't see what you gain with the proposed approach. Uros. > > > Please use if (!register_operand (operands[2], <MODE>mode)) instead. > Changed. > > Update patch. > > gcc/ChangeLog: > > PR target/106038 > * config/i386/mmx.md (<code><mode>3): New define_expand, it's > original "<code><mode>3". > (*<code><mode>3): New define_insn, it's original > "<code><mode>3" be extended to handle memory and immediate > operand with ix86_binary_operator_ok. Also adjust define_split > after it. > (mmxinsnmode): New mode attribute. > (*mov<mode>_imm): Refactor with mmxinsnmode. > * config/i386/predicates.md > (register_or_x86_64_const_vector_operand): New predicate. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/pr106038-1.c: New test. > --- > gcc/config/i386/mmx.md | 71 ++++++++++++---------- > gcc/config/i386/predicates.md | 4 ++ > gcc/testsuite/gcc.target/i386/pr106038-1.c | 27 ++++++++ > 3 files changed, 71 insertions(+), 31 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/pr106038-1.c > > diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md > index 3294c1e6274..316b83dd3ac 100644 > --- a/gcc/config/i386/mmx.md > +++ b/gcc/config/i386/mmx.md > @@ -86,6 +86,14 @@ (define_mode_attr mmxvecsize > [(V8QI "b") (V4QI "b") (V2QI "b") > (V4HI "w") (V2HI "w") (V2SI "d") (V1DI "q")]) > > +;; Mapping to same size integral mode. > +(define_mode_attr mmxinsnmode > + [(V8QI "DI") (V4QI "SI") (V2QI "HI") > + (V4HI "DI") (V2HI "SI") > + (V2SI "DI") > + (V4HF "DI") (V2HF "SI") > + (V2SF "DI")]) > + > (define_mode_attr mmxdoublemode > [(V8QI "V8HI") (V4HI "V4SI")]) > > @@ -350,22 +358,7 @@ (define_insn_and_split "*mov<mode>_imm" > HOST_WIDE_INT val = ix86_convert_const_vector_to_integer (operands[1], > <MODE>mode); > operands[1] = GEN_INT (val); > - machine_mode mode; > - switch (GET_MODE_SIZE (<MODE>mode)) > - { > - case 2: > - mode = HImode; > - break; > - case 4: > - mode = SImode; > - break; > - case 8: > - mode = DImode; > - break; > - default: > - gcc_unreachable (); > - } > - operands[0] = lowpart_subreg (mode, operands[0], <MODE>mode); > + operands[0] = lowpart_subreg (<mmxinsnmode>mode, operands[0], <MODE>mode); > }) > > ;; For TARGET_64BIT we always round up to 8 bytes. > @@ -2974,33 +2967,49 @@ (define_insn "*mmx_<code><mode>3" > (set_attr "type" "mmxadd,sselog,sselog,sselog") > (set_attr "mode" "DI,TI,TI,TI")]) > > -(define_insn "<code><mode>3" > - [(set (match_operand:VI_16_32 0 "register_operand" "=?r,x,x,v") > +(define_expand "<code><mode>3" > + [(parallel > + [(set (match_operand:VI_16_32 0 "register_operand") > + (any_logic:VI_16_32 > + (match_operand:VI_16_32 1 "register_operand") > + (match_operand:VI_16_32 2 "register_operand"))) > + (clobber (reg:CC FLAGS_REG))])] > + "") > + > +(define_insn "*<code><mode>3" > + [(set (match_operand:VI_16_32 0 "nonimmediate_operand" "=?r,m,x,x,v") > (any_logic:VI_16_32 > - (match_operand:VI_16_32 1 "register_operand" "%0,0,x,v") > - (match_operand:VI_16_32 2 "register_operand" "r,x,x,v"))) > + (match_operand:VI_16_32 1 "nonimmediate_operand" "%0,0,0,x,v") > + (match_operand:VI_16_32 2 "register_or_x86_64_const_vector_operand" "r,i,x,x,v"))) > (clobber (reg:CC FLAGS_REG))] > - "" > + "ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)" > "#" > - [(set_attr "isa" "*,sse2_noavx,avx,avx512vl") > - (set_attr "type" "alu,sselog,sselog,sselog") > - (set_attr "mode" "SI,TI,TI,TI")]) > + [(set_attr "isa" "*,*,sse2_noavx,avx,avx512vl") > + (set_attr "type" "alu,alu,sselog,sselog,sselog") > + (set_attr "mode" "SI,SI,TI,TI,TI")]) > > (define_split > - [(set (match_operand:VI_16_32 0 "general_reg_operand") > + [(set (match_operand:VI_16_32 0 "nonimmediate_gr_operand") > (any_logic:VI_16_32 > - (match_operand:VI_16_32 1 "general_reg_operand") > - (match_operand:VI_16_32 2 "general_reg_operand"))) > + (match_operand:VI_16_32 1 "nonimmediate_gr_operand") > + (match_operand:VI_16_32 2 "register_or_x86_64_const_vector_operand"))) > (clobber (reg:CC FLAGS_REG))] > "reload_completed" > [(parallel > [(set (match_dup 0) > - (any_logic:SI (match_dup 1) (match_dup 2))) > + (any_logic:<mmxinsnmode> (match_dup 1) (match_dup 2))) > (clobber (reg:CC FLAGS_REG))])] > { > - operands[2] = lowpart_subreg (SImode, operands[2], <MODE>mode); > - operands[1] = lowpart_subreg (SImode, operands[1], <MODE>mode); > - operands[0] = lowpart_subreg (SImode, operands[0], <MODE>mode); > + if (!register_operand (operands[2], <MODE>mode)) > + { > + HOST_WIDE_INT val = ix86_convert_const_vector_to_integer (operands[2], > + <MODE>mode); > + operands[2] = GEN_INT (val); > + } > + else > + operands[2] = lowpart_subreg (<mmxinsnmode>mode, operands[2], <MODE>mode); > + operands[1] = lowpart_subreg (<mmxinsnmode>mode, operands[1], <MODE>mode); > + operands[0] = lowpart_subreg (<mmxinsnmode>mode, operands[0], <MODE>mode); > }) > > (define_split > diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md > index c71c453cceb..5f63a7d52f5 100644 > --- a/gcc/config/i386/predicates.md > +++ b/gcc/config/i386/predicates.md > @@ -1205,6 +1205,10 @@ (define_predicate "x86_64_const_vector_operand" > return trunc_int_for_mode (val, SImode) == val; > }) > > +(define_predicate "register_or_x86_64_const_vector_operand" > + (ior (match_operand 0 "register_operand") > + (match_operand 0 "x86_64_const_vector_operand"))) > + > ;; Return true when OP is nonimmediate or standard SSE constant. > (define_predicate "nonimmediate_or_sse_const_operand" > (ior (match_operand 0 "nonimmediate_operand") > diff --git a/gcc/testsuite/gcc.target/i386/pr106038-1.c b/gcc/testsuite/gcc.target/i386/pr106038-1.c > new file mode 100644 > index 00000000000..bb52385c8a5 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr106038-1.c > @@ -0,0 +1,27 @@ > +/* { dg-do compile } */ > +/* { dg-options "-msse2 -O2" } */ > +/* { dg-final { scan-assembler-not "xmm" } } */ > + > +void > +foo3 (char* a, char* __restrict b) > +{ > + a[0] &= 1; > + a[1] &= 2; > + a[2] &= 3; > + a[3] &= 3; > +} > + > +void > +foo4 (char* a, char* __restrict b) > +{ > + a[0] &= 1; > + a[1] &= 2; > +} > + > + > +void > +foo1 (short* a, short* __restrict b) > +{ > + a[0] &= 1; > + a[1] &= 2; > +} > -- > 2.18.1 >
On Tue, Jul 19, 2022 at 2:35 PM Uros Bizjak via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > On Tue, Jul 19, 2022 at 8:07 AM liuhongt <hongtao.liu@intel.com> wrote: > > > > And split it after reload. > > > > > You will need ix86_binary_operator_ok insn constraint here with > > > corresponding expander using ix86_fixup_binary_operands_no_copy to > > > prepare insn operands. > > Split define_expand with just register_operand, and allow > > memory/immediate in define_insn, assume combine/forwprop will do optimization. > > But you will *ease* the job of the above passes if you use > ix86_fixup_binary_operands_no_copy in the expander. for -m32, it will hit ICE in Breakpoint 1, ix86_fixup_binary_operands_no_copy (code=XOR, mode=E_V4QImode, operands=0x7fffffffa970) a /gcc/config/i386/i386-expand.cc:1184 1184 rtx dst = ix86_fixup_binary_operands (code, mode, operands); (gdb) n 1185 gcc_assert (dst == operands[0]); -- here (gdb) the original operands[0], operands[1], operands[2] are below (gdb) p debug_rtx (operands[0]) (mem/c:V4QI (plus:SI (reg/f:SI 77 virtual-stack-vars) (const_int -8220 [0xffffffffffffdfe4])) [0 MEM <vector(4) unsigned char> [(unsigned char *)&tmp2 + 4B]+0 S4 A32]) $1 = void (gdb) p debug_rtx (operands[1]) (subreg:V4QI (reg:SI 129) 0) $2 = void (gdb) p debug_rtx (operands[2]) (subreg:V4QI (reg:SI 98 [ _46 ]) 0) $3 = void (gdb) since operands[0] is mem and not equal to operands[1], ix86_fixup_binary_operands will create a pseudo register for dst. and then hit ICE. Is this a bug or assumed? > > Uros. > > > > > > Please use if (!register_operand (operands[2], <MODE>mode)) instead. > > Changed. > > > > Update patch. > > > > gcc/ChangeLog: > > > > PR target/106038 > > * config/i386/mmx.md (<code><mode>3): New define_expand, it's > > original "<code><mode>3". > > (*<code><mode>3): New define_insn, it's original > > "<code><mode>3" be extended to handle memory and immediate > > operand with ix86_binary_operator_ok. Also adjust define_split > > after it. > > (mmxinsnmode): New mode attribute. > > (*mov<mode>_imm): Refactor with mmxinsnmode. > > * config/i386/predicates.md > > (register_or_x86_64_const_vector_operand): New predicate. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/i386/pr106038-1.c: New test. > > --- > > gcc/config/i386/mmx.md | 71 ++++++++++++---------- > > gcc/config/i386/predicates.md | 4 ++ > > gcc/testsuite/gcc.target/i386/pr106038-1.c | 27 ++++++++ > > 3 files changed, 71 insertions(+), 31 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/i386/pr106038-1.c > > > > diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md > > index 3294c1e6274..316b83dd3ac 100644 > > --- a/gcc/config/i386/mmx.md > > +++ b/gcc/config/i386/mmx.md > > @@ -86,6 +86,14 @@ (define_mode_attr mmxvecsize > > [(V8QI "b") (V4QI "b") (V2QI "b") > > (V4HI "w") (V2HI "w") (V2SI "d") (V1DI "q")]) > > > > +;; Mapping to same size integral mode. > > +(define_mode_attr mmxinsnmode > > + [(V8QI "DI") (V4QI "SI") (V2QI "HI") > > + (V4HI "DI") (V2HI "SI") > > + (V2SI "DI") > > + (V4HF "DI") (V2HF "SI") > > + (V2SF "DI")]) > > + > > (define_mode_attr mmxdoublemode > > [(V8QI "V8HI") (V4HI "V4SI")]) > > > > @@ -350,22 +358,7 @@ (define_insn_and_split "*mov<mode>_imm" > > HOST_WIDE_INT val = ix86_convert_const_vector_to_integer (operands[1], > > <MODE>mode); > > operands[1] = GEN_INT (val); > > - machine_mode mode; > > - switch (GET_MODE_SIZE (<MODE>mode)) > > - { > > - case 2: > > - mode = HImode; > > - break; > > - case 4: > > - mode = SImode; > > - break; > > - case 8: > > - mode = DImode; > > - break; > > - default: > > - gcc_unreachable (); > > - } > > - operands[0] = lowpart_subreg (mode, operands[0], <MODE>mode); > > + operands[0] = lowpart_subreg (<mmxinsnmode>mode, operands[0], <MODE>mode); > > }) > > > > ;; For TARGET_64BIT we always round up to 8 bytes. > > @@ -2974,33 +2967,49 @@ (define_insn "*mmx_<code><mode>3" > > (set_attr "type" "mmxadd,sselog,sselog,sselog") > > (set_attr "mode" "DI,TI,TI,TI")]) > > > > -(define_insn "<code><mode>3" > > - [(set (match_operand:VI_16_32 0 "register_operand" "=?r,x,x,v") > > +(define_expand "<code><mode>3" > > + [(parallel > > + [(set (match_operand:VI_16_32 0 "register_operand") > > + (any_logic:VI_16_32 > > + (match_operand:VI_16_32 1 "register_operand") > > + (match_operand:VI_16_32 2 "register_operand"))) > > + (clobber (reg:CC FLAGS_REG))])] > > + "") > > + > > +(define_insn "*<code><mode>3" > > + [(set (match_operand:VI_16_32 0 "nonimmediate_operand" "=?r,m,x,x,v") > > (any_logic:VI_16_32 > > - (match_operand:VI_16_32 1 "register_operand" "%0,0,x,v") > > - (match_operand:VI_16_32 2 "register_operand" "r,x,x,v"))) > > + (match_operand:VI_16_32 1 "nonimmediate_operand" "%0,0,0,x,v") > > + (match_operand:VI_16_32 2 "register_or_x86_64_const_vector_operand" "r,i,x,x,v"))) > > (clobber (reg:CC FLAGS_REG))] > > - "" > > + "ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)" > > "#" > > - [(set_attr "isa" "*,sse2_noavx,avx,avx512vl") > > - (set_attr "type" "alu,sselog,sselog,sselog") > > - (set_attr "mode" "SI,TI,TI,TI")]) > > + [(set_attr "isa" "*,*,sse2_noavx,avx,avx512vl") > > + (set_attr "type" "alu,alu,sselog,sselog,sselog") > > + (set_attr "mode" "SI,SI,TI,TI,TI")]) > > > > (define_split > > - [(set (match_operand:VI_16_32 0 "general_reg_operand") > > + [(set (match_operand:VI_16_32 0 "nonimmediate_gr_operand") > > (any_logic:VI_16_32 > > - (match_operand:VI_16_32 1 "general_reg_operand") > > - (match_operand:VI_16_32 2 "general_reg_operand"))) > > + (match_operand:VI_16_32 1 "nonimmediate_gr_operand") > > + (match_operand:VI_16_32 2 "register_or_x86_64_const_vector_operand"))) > > (clobber (reg:CC FLAGS_REG))] > > "reload_completed" > > [(parallel > > [(set (match_dup 0) > > - (any_logic:SI (match_dup 1) (match_dup 2))) > > + (any_logic:<mmxinsnmode> (match_dup 1) (match_dup 2))) > > (clobber (reg:CC FLAGS_REG))])] > > { > > - operands[2] = lowpart_subreg (SImode, operands[2], <MODE>mode); > > - operands[1] = lowpart_subreg (SImode, operands[1], <MODE>mode); > > - operands[0] = lowpart_subreg (SImode, operands[0], <MODE>mode); > > + if (!register_operand (operands[2], <MODE>mode)) > > + { > > + HOST_WIDE_INT val = ix86_convert_const_vector_to_integer (operands[2], > > + <MODE>mode); > > + operands[2] = GEN_INT (val); > > + } > > + else > > + operands[2] = lowpart_subreg (<mmxinsnmode>mode, operands[2], <MODE>mode); > > + operands[1] = lowpart_subreg (<mmxinsnmode>mode, operands[1], <MODE>mode); > > + operands[0] = lowpart_subreg (<mmxinsnmode>mode, operands[0], <MODE>mode); > > }) > > > > (define_split > > diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md > > index c71c453cceb..5f63a7d52f5 100644 > > --- a/gcc/config/i386/predicates.md > > +++ b/gcc/config/i386/predicates.md > > @@ -1205,6 +1205,10 @@ (define_predicate "x86_64_const_vector_operand" > > return trunc_int_for_mode (val, SImode) == val; > > }) > > > > +(define_predicate "register_or_x86_64_const_vector_operand" > > + (ior (match_operand 0 "register_operand") > > + (match_operand 0 "x86_64_const_vector_operand"))) > > + > > ;; Return true when OP is nonimmediate or standard SSE constant. > > (define_predicate "nonimmediate_or_sse_const_operand" > > (ior (match_operand 0 "nonimmediate_operand") > > diff --git a/gcc/testsuite/gcc.target/i386/pr106038-1.c b/gcc/testsuite/gcc.target/i386/pr106038-1.c > > new file mode 100644 > > index 00000000000..bb52385c8a5 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/i386/pr106038-1.c > > @@ -0,0 +1,27 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-msse2 -O2" } */ > > +/* { dg-final { scan-assembler-not "xmm" } } */ > > + > > +void > > +foo3 (char* a, char* __restrict b) > > +{ > > + a[0] &= 1; > > + a[1] &= 2; > > + a[2] &= 3; > > + a[3] &= 3; > > +} > > + > > +void > > +foo4 (char* a, char* __restrict b) > > +{ > > + a[0] &= 1; > > + a[1] &= 2; > > +} > > + > > + > > +void > > +foo1 (short* a, short* __restrict b) > > +{ > > + a[0] &= 1; > > + a[1] &= 2; > > +} > > -- > > 2.18.1 > > -- BR, Hongtao
On Tue, Jul 19, 2022 at 8:56 AM Hongtao Liu <crazylht@gmail.com> wrote: > > On Tue, Jul 19, 2022 at 2:35 PM Uros Bizjak via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: > > > > On Tue, Jul 19, 2022 at 8:07 AM liuhongt <hongtao.liu@intel.com> wrote: > > > > > > And split it after reload. > > > > > > > You will need ix86_binary_operator_ok insn constraint here with > > > > corresponding expander using ix86_fixup_binary_operands_no_copy to > > > > prepare insn operands. > > > Split define_expand with just register_operand, and allow > > > memory/immediate in define_insn, assume combine/forwprop will do optimization. > > > > But you will *ease* the job of the above passes if you use > > ix86_fixup_binary_operands_no_copy in the expander. > for -m32, it will hit ICE in > Breakpoint 1, ix86_fixup_binary_operands_no_copy (code=XOR, > mode=E_V4QImode, operands=0x7fffffffa970) a > /gcc/config/i386/i386-expand.cc:1184 > 1184 rtx dst = ix86_fixup_binary_operands (code, mode, operands); > (gdb) n > 1185 gcc_assert (dst == operands[0]); -- here > (gdb) > > the original operands[0], operands[1], operands[2] are below > (gdb) p debug_rtx (operands[0]) > (mem/c:V4QI (plus:SI (reg/f:SI 77 virtual-stack-vars) > (const_int -8220 [0xffffffffffffdfe4])) [0 MEM <vector(4) > unsigned char> [(unsigned char *)&tmp2 + 4B]+0 S4 A32]) > $1 = void > (gdb) p debug_rtx (operands[1]) > (subreg:V4QI (reg:SI 129) 0) > $2 = void > (gdb) p debug_rtx (operands[2]) > (subreg:V4QI (reg:SI 98 [ _46 ]) 0) > $3 = void > (gdb) > > since operands[0] is mem and not equal to operands[1], > ix86_fixup_binary_operands will create a pseudo register for dst. and > then hit ICE. > Is this a bug or assumed? You will need ix86_expand_binary_operator here. Uros.
On Tue, Jul 19, 2022 at 5:37 PM Uros Bizjak <ubizjak@gmail.com> wrote: > > On Tue, Jul 19, 2022 at 8:56 AM Hongtao Liu <crazylht@gmail.com> wrote: > > > > On Tue, Jul 19, 2022 at 2:35 PM Uros Bizjak via Gcc-patches > > <gcc-patches@gcc.gnu.org> wrote: > > > > > > On Tue, Jul 19, 2022 at 8:07 AM liuhongt <hongtao.liu@intel.com> wrote: > > > > > > > > And split it after reload. > > > > > > > > > You will need ix86_binary_operator_ok insn constraint here with > > > > > corresponding expander using ix86_fixup_binary_operands_no_copy to > > > > > prepare insn operands. > > > > Split define_expand with just register_operand, and allow > > > > memory/immediate in define_insn, assume combine/forwprop will do optimization. > > > > > > But you will *ease* the job of the above passes if you use > > > ix86_fixup_binary_operands_no_copy in the expander. > > for -m32, it will hit ICE in > > Breakpoint 1, ix86_fixup_binary_operands_no_copy (code=XOR, > > mode=E_V4QImode, operands=0x7fffffffa970) a > > /gcc/config/i386/i386-expand.cc:1184 > > 1184 rtx dst = ix86_fixup_binary_operands (code, mode, operands); > > (gdb) n > > 1185 gcc_assert (dst == operands[0]); -- here > > (gdb) > > > > the original operands[0], operands[1], operands[2] are below > > (gdb) p debug_rtx (operands[0]) > > (mem/c:V4QI (plus:SI (reg/f:SI 77 virtual-stack-vars) > > (const_int -8220 [0xffffffffffffdfe4])) [0 MEM <vector(4) > > unsigned char> [(unsigned char *)&tmp2 + 4B]+0 S4 A32]) > > $1 = void > > (gdb) p debug_rtx (operands[1]) > > (subreg:V4QI (reg:SI 129) 0) > > $2 = void > > (gdb) p debug_rtx (operands[2]) > > (subreg:V4QI (reg:SI 98 [ _46 ]) 0) > > $3 = void > > (gdb) > > > > since operands[0] is mem and not equal to operands[1], > > ix86_fixup_binary_operands will create a pseudo register for dst. and > > then hit ICE. > > Is this a bug or assumed? > > You will need ix86_expand_binary_operator here. It will swap memory operand from op1 to op2 and hit ICE for unrecognized insn. What about this? -(define_insn "<code><mode>3" - [(set (match_operand:VI_16_32 0 "register_operand" "=?r,x,x,v") +(define_expand "<code><mode>3" + [(set (match_operand:VI_16_32 0 "nonimmediate_operand") (any_logic:VI_16_32 - (match_operand:VI_16_32 1 "register_operand" "%0,0,x,v") - (match_operand:VI_16_32 2 "register_operand" "r,x,x,v"))) - (clobber (reg:CC FLAGS_REG))] + (match_operand:VI_16_32 1 "nonimmediate_operand") + (match_operand:VI_16_32 2 "register_or_x86_64_const_vector_operand")))] "" +{ + rtx dst = ix86_fixup_binary_operands (<CODE>, <MODE>mode, operands); + if (MEM_P (operands[2])) + operands[2] = force_reg (<MODE>mode, operands[2]); + rtx op = gen_rtx_SET (dst, gen_rtx_fmt_ee (<CODE>, <MODE>mode, + operands[1], operands[2])); + rtx clob = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (CCmode, FLAGS_REG)); + emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, op, clob))); + if (dst != operands[0]) + emit_move_insn (operands[0], dst); + DONE; +}) + > > Uros.
On Wed, Jul 20, 2022 at 4:37 AM Hongtao Liu <crazylht@gmail.com> wrote: > > On Tue, Jul 19, 2022 at 5:37 PM Uros Bizjak <ubizjak@gmail.com> wrote: > > > > On Tue, Jul 19, 2022 at 8:56 AM Hongtao Liu <crazylht@gmail.com> wrote: > > > > > > On Tue, Jul 19, 2022 at 2:35 PM Uros Bizjak via Gcc-patches > > > <gcc-patches@gcc.gnu.org> wrote: > > > > > > > > On Tue, Jul 19, 2022 at 8:07 AM liuhongt <hongtao.liu@intel.com> wrote: > > > > > > > > > > And split it after reload. > > > > > > > > > > > You will need ix86_binary_operator_ok insn constraint here with > > > > > > corresponding expander using ix86_fixup_binary_operands_no_copy to > > > > > > prepare insn operands. > > > > > Split define_expand with just register_operand, and allow > > > > > memory/immediate in define_insn, assume combine/forwprop will do optimization. > > > > > > > > But you will *ease* the job of the above passes if you use > > > > ix86_fixup_binary_operands_no_copy in the expander. > > > for -m32, it will hit ICE in > > > Breakpoint 1, ix86_fixup_binary_operands_no_copy (code=XOR, > > > mode=E_V4QImode, operands=0x7fffffffa970) a > > > /gcc/config/i386/i386-expand.cc:1184 > > > 1184 rtx dst = ix86_fixup_binary_operands (code, mode, operands); > > > (gdb) n > > > 1185 gcc_assert (dst == operands[0]); -- here > > > (gdb) > > > > > > the original operands[0], operands[1], operands[2] are below > > > (gdb) p debug_rtx (operands[0]) > > > (mem/c:V4QI (plus:SI (reg/f:SI 77 virtual-stack-vars) > > > (const_int -8220 [0xffffffffffffdfe4])) [0 MEM <vector(4) > > > unsigned char> [(unsigned char *)&tmp2 + 4B]+0 S4 A32]) > > > $1 = void > > > (gdb) p debug_rtx (operands[1]) > > > (subreg:V4QI (reg:SI 129) 0) > > > $2 = void > > > (gdb) p debug_rtx (operands[2]) > > > (subreg:V4QI (reg:SI 98 [ _46 ]) 0) > > > $3 = void > > > (gdb) > > > > > > since operands[0] is mem and not equal to operands[1], > > > ix86_fixup_binary_operands will create a pseudo register for dst. and > > > then hit ICE. > > > Is this a bug or assumed? > > > > You will need ix86_expand_binary_operator here. > It will swap memory operand from op1 to op2 and hit ICE for unrecognized insn. > > What about this? Still no good... You are using commutative operands, so the predicate of operand 2 should also allow memory. So, the predicate should be nonimmediate_or_x86_64_const_vector_operand. The intermediate insn pattern should look something like *<any_or:code><mode>_1, but with added XMM and MMX reg alternatives instead of mask regs. Uros. > > -(define_insn "<code><mode>3" > - [(set (match_operand:VI_16_32 0 "register_operand" "=?r,x,x,v") > +(define_expand "<code><mode>3" > + [(set (match_operand:VI_16_32 0 "nonimmediate_operand") > (any_logic:VI_16_32 > - (match_operand:VI_16_32 1 "register_operand" "%0,0,x,v") > - (match_operand:VI_16_32 2 "register_operand" "r,x,x,v"))) > - (clobber (reg:CC FLAGS_REG))] > + (match_operand:VI_16_32 1 "nonimmediate_operand") > + (match_operand:VI_16_32 2 > "register_or_x86_64_const_vector_operand")))] > "" > +{ > + rtx dst = ix86_fixup_binary_operands (<CODE>, <MODE>mode, operands); > + if (MEM_P (operands[2])) > + operands[2] = force_reg (<MODE>mode, operands[2]); > + rtx op = gen_rtx_SET (dst, gen_rtx_fmt_ee (<CODE>, <MODE>mode, > + operands[1], operands[2])); > + rtx clob = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (CCmode, FLAGS_REG)); > + emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, op, clob))); > + if (dst != operands[0]) > + emit_move_insn (operands[0], dst); > + DONE; > +}) > + > > > > > Uros. > > > > -- > BR, > Hongtao
On Wed, Jul 20, 2022 at 8:14 AM Uros Bizjak <ubizjak@gmail.com> wrote: > > On Wed, Jul 20, 2022 at 4:37 AM Hongtao Liu <crazylht@gmail.com> wrote: > > > > On Tue, Jul 19, 2022 at 5:37 PM Uros Bizjak <ubizjak@gmail.com> wrote: > > > > > > On Tue, Jul 19, 2022 at 8:56 AM Hongtao Liu <crazylht@gmail.com> wrote: > > > > > > > > On Tue, Jul 19, 2022 at 2:35 PM Uros Bizjak via Gcc-patches > > > > <gcc-patches@gcc.gnu.org> wrote: > > > > > > > > > > On Tue, Jul 19, 2022 at 8:07 AM liuhongt <hongtao.liu@intel.com> wrote: > > > > > > > > > > > > And split it after reload. > > > > > > > > > > > > > You will need ix86_binary_operator_ok insn constraint here with > > > > > > > corresponding expander using ix86_fixup_binary_operands_no_copy to > > > > > > > prepare insn operands. > > > > > > Split define_expand with just register_operand, and allow > > > > > > memory/immediate in define_insn, assume combine/forwprop will do optimization. > > > > > > > > > > But you will *ease* the job of the above passes if you use > > > > > ix86_fixup_binary_operands_no_copy in the expander. > > > > for -m32, it will hit ICE in > > > > Breakpoint 1, ix86_fixup_binary_operands_no_copy (code=XOR, > > > > mode=E_V4QImode, operands=0x7fffffffa970) a > > > > /gcc/config/i386/i386-expand.cc:1184 > > > > 1184 rtx dst = ix86_fixup_binary_operands (code, mode, operands); > > > > (gdb) n > > > > 1185 gcc_assert (dst == operands[0]); -- here > > > > (gdb) > > > > > > > > the original operands[0], operands[1], operands[2] are below > > > > (gdb) p debug_rtx (operands[0]) > > > > (mem/c:V4QI (plus:SI (reg/f:SI 77 virtual-stack-vars) > > > > (const_int -8220 [0xffffffffffffdfe4])) [0 MEM <vector(4) > > > > unsigned char> [(unsigned char *)&tmp2 + 4B]+0 S4 A32]) > > > > $1 = void > > > > (gdb) p debug_rtx (operands[1]) > > > > (subreg:V4QI (reg:SI 129) 0) > > > > $2 = void > > > > (gdb) p debug_rtx (operands[2]) > > > > (subreg:V4QI (reg:SI 98 [ _46 ]) 0) > > > > $3 = void > > > > (gdb) > > > > > > > > since operands[0] is mem and not equal to operands[1], > > > > ix86_fixup_binary_operands will create a pseudo register for dst. and > > > > then hit ICE. > > > > Is this a bug or assumed? > > > > > > You will need ix86_expand_binary_operator here. > > It will swap memory operand from op1 to op2 and hit ICE for unrecognized insn. > > > > What about this? > > Still no good... You are using commutative operands, so the predicate > of operand 2 should also allow memory. So, the predicate should be > nonimmediate_or_x86_64_const_vector_operand. The intermediate insn > pattern should look something like *<any_or:code><mode>_1, but with > added XMM and MMX reg alternatives instead of mask regs. Alternatively, you can use UNKNOWN operator to prevent canonicalization, but then you should not use commutative constraint in the intermediate insn. I think this is the best solution. Uros. > > > > -(define_insn "<code><mode>3" > > - [(set (match_operand:VI_16_32 0 "register_operand" "=?r,x,x,v") > > +(define_expand "<code><mode>3" > > + [(set (match_operand:VI_16_32 0 "nonimmediate_operand") > > (any_logic:VI_16_32 > > - (match_operand:VI_16_32 1 "register_operand" "%0,0,x,v") > > - (match_operand:VI_16_32 2 "register_operand" "r,x,x,v"))) > > - (clobber (reg:CC FLAGS_REG))] > > + (match_operand:VI_16_32 1 "nonimmediate_operand") > > + (match_operand:VI_16_32 2 > > "register_or_x86_64_const_vector_operand")))] > > "" > > +{ > > + rtx dst = ix86_fixup_binary_operands (<CODE>, <MODE>mode, operands); > > + if (MEM_P (operands[2])) > > + operands[2] = force_reg (<MODE>mode, operands[2]); > > + rtx op = gen_rtx_SET (dst, gen_rtx_fmt_ee (<CODE>, <MODE>mode, > > + operands[1], operands[2])); > > + rtx clob = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (CCmode, FLAGS_REG)); > > + emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, op, clob))); > > + if (dst != operands[0]) > > + emit_move_insn (operands[0], dst); > > + DONE; > > +}) > > + > > > > > > > > Uros. > > > > > > > > -- > > BR, > > Hongtao
On Wed, Jul 20, 2022 at 2:18 PM Uros Bizjak <ubizjak@gmail.com> wrote: > > On Wed, Jul 20, 2022 at 8:14 AM Uros Bizjak <ubizjak@gmail.com> wrote: > > > > On Wed, Jul 20, 2022 at 4:37 AM Hongtao Liu <crazylht@gmail.com> wrote: > > > > > > On Tue, Jul 19, 2022 at 5:37 PM Uros Bizjak <ubizjak@gmail.com> wrote: > > > > > > > > On Tue, Jul 19, 2022 at 8:56 AM Hongtao Liu <crazylht@gmail.com> wrote: > > > > > > > > > > On Tue, Jul 19, 2022 at 2:35 PM Uros Bizjak via Gcc-patches > > > > > <gcc-patches@gcc.gnu.org> wrote: > > > > > > > > > > > > On Tue, Jul 19, 2022 at 8:07 AM liuhongt <hongtao.liu@intel.com> wrote: > > > > > > > > > > > > > > And split it after reload. > > > > > > > > > > > > > > > You will need ix86_binary_operator_ok insn constraint here with > > > > > > > > corresponding expander using ix86_fixup_binary_operands_no_copy to > > > > > > > > prepare insn operands. > > > > > > > Split define_expand with just register_operand, and allow > > > > > > > memory/immediate in define_insn, assume combine/forwprop will do optimization. > > > > > > > > > > > > But you will *ease* the job of the above passes if you use > > > > > > ix86_fixup_binary_operands_no_copy in the expander. > > > > > for -m32, it will hit ICE in > > > > > Breakpoint 1, ix86_fixup_binary_operands_no_copy (code=XOR, > > > > > mode=E_V4QImode, operands=0x7fffffffa970) a > > > > > /gcc/config/i386/i386-expand.cc:1184 > > > > > 1184 rtx dst = ix86_fixup_binary_operands (code, mode, operands); > > > > > (gdb) n > > > > > 1185 gcc_assert (dst == operands[0]); -- here > > > > > (gdb) > > > > > > > > > > the original operands[0], operands[1], operands[2] are below > > > > > (gdb) p debug_rtx (operands[0]) > > > > > (mem/c:V4QI (plus:SI (reg/f:SI 77 virtual-stack-vars) > > > > > (const_int -8220 [0xffffffffffffdfe4])) [0 MEM <vector(4) > > > > > unsigned char> [(unsigned char *)&tmp2 + 4B]+0 S4 A32]) > > > > > $1 = void > > > > > (gdb) p debug_rtx (operands[1]) > > > > > (subreg:V4QI (reg:SI 129) 0) > > > > > $2 = void > > > > > (gdb) p debug_rtx (operands[2]) > > > > > (subreg:V4QI (reg:SI 98 [ _46 ]) 0) > > > > > $3 = void > > > > > (gdb) > > > > > > > > > > since operands[0] is mem and not equal to operands[1], > > > > > ix86_fixup_binary_operands will create a pseudo register for dst. and > > > > > then hit ICE. > > > > > Is this a bug or assumed? > > > > > > > > You will need ix86_expand_binary_operator here. > > > It will swap memory operand from op1 to op2 and hit ICE for unrecognized insn. > > > > > > What about this? > > > > Still no good... You are using commutative operands, so the predicate > > of operand 2 should also allow memory. So, the predicate should be > > nonimmediate_or_x86_64_const_vector_operand. The intermediate insn > > pattern should look something like *<any_or:code><mode>_1, but with > > added XMM and MMX reg alternatives instead of mask regs. > > Alternatively, you can use UNKNOWN operator to prevent > canonicalization, but then you should not use commutative constraint > in the intermediate insn. I think this is the best solution. Like this? -(define_insn "<code><mode>3" - [(set (match_operand:VI_16_32 0 "register_operand" "=?r,x,x,v") +(define_expand "<code><mode>3" + [(set (match_operand:VI_16_32 0 "nonimmediate_operand") (any_logic:VI_16_32 - (match_operand:VI_16_32 1 "register_operand" "%0,0,x,v") - (match_operand:VI_16_32 2 "register_operand" "r,x,x,v"))) - (clobber (reg:CC FLAGS_REG))] + (match_operand:VI_16_32 1 "nonimmediate_operand") + (match_operand:VI_16_32 2 "register_or_x86_64_const_vector_operand")))] "" +{ + rtx dst = ix86_fixup_binary_operands (<CODE>, <MODE>mode, operands); + if (MEM_P (operands[2])) + operands[2] = force_reg (<MODE>mode, operands[2]); + rtx op = gen_rtx_SET (dst, gen_rtx_fmt_ee (<CODE>, <MODE>mode, + operands[1], operands[2])); + rtx clob = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (CCmode, FLAGS_REG)); + emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, op, clob))); + if (dst != operands[0]) + emit_move_insn (operands[0], dst); + DONE; +}) + +(define_insn "*<code><mode>3" + [(set (match_operand:VI_16_32 0 "nonimmediate_operand" "=?r,m,x,x,v") + (any_logic:VI_16_32 + (match_operand:VI_16_32 1 "nonimmediate_operand" "0,0,0,x,v") + (match_operand:VI_16_32 2 "register_or_x86_64_const_vector_operand" "r,i,x,x,v"))) + (clobber (reg:CC FLAGS_REG))] + "ix86_binary_operator_ok (UNKNOWN, <MODE>mode, operands)" "#" - [(set_attr "isa" "*,sse2_noavx,avx,avx512vl") - (set_attr "type" "alu,sselog,sselog,sselog") - (set_attr "mode" "SI,TI,TI,TI")]) + [(set_attr "isa" "*,*,sse2_noavx,avx,avx512vl") + (set_attr "type" "alu,alu,sselog,sselog,sselog") + (set_attr "mode" "SI,SI,TI,TI,TI")]) > > Uros. > > > > > > > -(define_insn "<code><mode>3" > > > - [(set (match_operand:VI_16_32 0 "register_operand" "=?r,x,x,v") > > > +(define_expand "<code><mode>3" > > > + [(set (match_operand:VI_16_32 0 "nonimmediate_operand") > > > (any_logic:VI_16_32 > > > - (match_operand:VI_16_32 1 "register_operand" "%0,0,x,v") > > > - (match_operand:VI_16_32 2 "register_operand" "r,x,x,v"))) > > > - (clobber (reg:CC FLAGS_REG))] > > > + (match_operand:VI_16_32 1 "nonimmediate_operand") > > > + (match_operand:VI_16_32 2 > > > "register_or_x86_64_const_vector_operand")))] > > > "" > > > +{ > > > + rtx dst = ix86_fixup_binary_operands (<CODE>, <MODE>mode, operands); > > > + if (MEM_P (operands[2])) > > > + operands[2] = force_reg (<MODE>mode, operands[2]); > > > + rtx op = gen_rtx_SET (dst, gen_rtx_fmt_ee (<CODE>, <MODE>mode, > > > + operands[1], operands[2])); > > > + rtx clob = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (CCmode, FLAGS_REG)); > > > + emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, op, clob))); > > > + if (dst != operands[0]) > > > + emit_move_insn (operands[0], dst); > > > + DONE; > > > +}) > > > + > > > > > > > > > > > Uros. > > > > > > > > > > > > -- > > > BR, > > > Hongtao
On Wed, Jul 20, 2022 at 8:54 AM Hongtao Liu <crazylht@gmail.com> wrote: > > On Wed, Jul 20, 2022 at 2:18 PM Uros Bizjak <ubizjak@gmail.com> wrote: > > > > On Wed, Jul 20, 2022 at 8:14 AM Uros Bizjak <ubizjak@gmail.com> wrote: > > > > > > On Wed, Jul 20, 2022 at 4:37 AM Hongtao Liu <crazylht@gmail.com> wrote: > > > > > > > > On Tue, Jul 19, 2022 at 5:37 PM Uros Bizjak <ubizjak@gmail.com> wrote: > > > > > > > > > > On Tue, Jul 19, 2022 at 8:56 AM Hongtao Liu <crazylht@gmail.com> wrote: > > > > > > > > > > > > On Tue, Jul 19, 2022 at 2:35 PM Uros Bizjak via Gcc-patches > > > > > > <gcc-patches@gcc.gnu.org> wrote: > > > > > > > > > > > > > > On Tue, Jul 19, 2022 at 8:07 AM liuhongt <hongtao.liu@intel.com> wrote: > > > > > > > > > > > > > > > > And split it after reload. > > > > > > > > > > > > > > > > > You will need ix86_binary_operator_ok insn constraint here with > > > > > > > > > corresponding expander using ix86_fixup_binary_operands_no_copy to > > > > > > > > > prepare insn operands. > > > > > > > > Split define_expand with just register_operand, and allow > > > > > > > > memory/immediate in define_insn, assume combine/forwprop will do optimization. > > > > > > > > > > > > > > But you will *ease* the job of the above passes if you use > > > > > > > ix86_fixup_binary_operands_no_copy in the expander. > > > > > > for -m32, it will hit ICE in > > > > > > Breakpoint 1, ix86_fixup_binary_operands_no_copy (code=XOR, > > > > > > mode=E_V4QImode, operands=0x7fffffffa970) a > > > > > > /gcc/config/i386/i386-expand.cc:1184 > > > > > > 1184 rtx dst = ix86_fixup_binary_operands (code, mode, operands); > > > > > > (gdb) n > > > > > > 1185 gcc_assert (dst == operands[0]); -- here > > > > > > (gdb) > > > > > > > > > > > > the original operands[0], operands[1], operands[2] are below > > > > > > (gdb) p debug_rtx (operands[0]) > > > > > > (mem/c:V4QI (plus:SI (reg/f:SI 77 virtual-stack-vars) > > > > > > (const_int -8220 [0xffffffffffffdfe4])) [0 MEM <vector(4) > > > > > > unsigned char> [(unsigned char *)&tmp2 + 4B]+0 S4 A32]) > > > > > > $1 = void > > > > > > (gdb) p debug_rtx (operands[1]) > > > > > > (subreg:V4QI (reg:SI 129) 0) > > > > > > $2 = void > > > > > > (gdb) p debug_rtx (operands[2]) > > > > > > (subreg:V4QI (reg:SI 98 [ _46 ]) 0) > > > > > > $3 = void > > > > > > (gdb) > > > > > > > > > > > > since operands[0] is mem and not equal to operands[1], > > > > > > ix86_fixup_binary_operands will create a pseudo register for dst. and > > > > > > then hit ICE. > > > > > > Is this a bug or assumed? > > > > > > > > > > You will need ix86_expand_binary_operator here. > > > > It will swap memory operand from op1 to op2 and hit ICE for unrecognized insn. > > > > > > > > What about this? > > > > > > Still no good... You are using commutative operands, so the predicate > > > of operand 2 should also allow memory. So, the predicate should be > > > nonimmediate_or_x86_64_const_vector_operand. The intermediate insn > > > pattern should look something like *<any_or:code><mode>_1, but with > > > added XMM and MMX reg alternatives instead of mask regs. > > > > Alternatively, you can use UNKNOWN operator to prevent > > canonicalization, but then you should not use commutative constraint > > in the intermediate insn. I think this is the best solution. > Like this? Please check the attached (lightly tested) patch that keeps commutative operands. Uros. diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md index 3294c1e6274..0a39d396430 100644 --- a/gcc/config/i386/mmx.md +++ b/gcc/config/i386/mmx.md @@ -86,6 +86,14 @@ [(V8QI "b") (V4QI "b") (V2QI "b") (V4HI "w") (V2HI "w") (V2SI "d") (V1DI "q")]) +;; Mapping to same size integral mode. +(define_mode_attr mmxinsnmode + [(V8QI "DI") (V4QI "SI") (V2QI "HI") + (V4HI "DI") (V2HI "SI") + (V2SI "DI") + (V4HF "DI") (V2HF "SI") + (V2SF "DI")]) + (define_mode_attr mmxdoublemode [(V8QI "V8HI") (V4HI "V4SI")]) @@ -350,22 +358,7 @@ HOST_WIDE_INT val = ix86_convert_const_vector_to_integer (operands[1], <MODE>mode); operands[1] = GEN_INT (val); - machine_mode mode; - switch (GET_MODE_SIZE (<MODE>mode)) - { - case 2: - mode = HImode; - break; - case 4: - mode = SImode; - break; - case 8: - mode = DImode; - break; - default: - gcc_unreachable (); - } - operands[0] = lowpart_subreg (mode, operands[0], <MODE>mode); + operands[0] = lowpart_subreg (<mmxinsnmode>mode, operands[0], <MODE>mode); }) ;; For TARGET_64BIT we always round up to 8 bytes. @@ -2974,33 +2967,50 @@ (set_attr "type" "mmxadd,sselog,sselog,sselog") (set_attr "mode" "DI,TI,TI,TI")]) -(define_insn "<code><mode>3" - [(set (match_operand:VI_16_32 0 "register_operand" "=?r,x,x,v") +(define_expand "<code><mode>3" + [(parallel + [(set (match_operand:VI_16_32 0 "nonimmediate_operand") (any_logic:VI_16_32 - (match_operand:VI_16_32 1 "register_operand" "%0,0,x,v") - (match_operand:VI_16_32 2 "register_operand" "r,x,x,v"))) - (clobber (reg:CC FLAGS_REG))] + (match_operand:VI_16_32 1 "nonimmediate_operand") + (match_operand:VI_16_32 2 "nonimmediate_or_x86_64_const_vector_operand"))) + (clobber (reg:CC FLAGS_REG))])] "" + "ix86_expand_binary_operator (<CODE>, <MODE>mode, operands); DONE;") + +(define_insn "*<code><mode>3" + [(set (match_operand:VI_16_32 0 "nonimmediate_operand" "=?r,m,x,x,v") + (any_logic:VI_16_32 + (match_operand:VI_16_32 1 "nonimmediate_operand" "%0,0,0,x,v") + (match_operand:VI_16_32 2 "nonimmediate_or_x86_64_const_vector_operand" "r,i,x,x,v"))) + (clobber (reg:CC FLAGS_REG))] + "ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)" "#" - [(set_attr "isa" "*,sse2_noavx,avx,avx512vl") - (set_attr "type" "alu,sselog,sselog,sselog") - (set_attr "mode" "SI,TI,TI,TI")]) + [(set_attr "isa" "*,*,sse2_noavx,avx,avx512vl") + (set_attr "type" "alu,alu,sselog,sselog,sselog") + (set_attr "mode" "SI,SI,TI,TI,TI")]) (define_split - [(set (match_operand:VI_16_32 0 "general_reg_operand") + [(set (match_operand:VI_16_32 0 "nonimmediate_gr_operand") (any_logic:VI_16_32 - (match_operand:VI_16_32 1 "general_reg_operand") - (match_operand:VI_16_32 2 "general_reg_operand"))) + (match_operand:VI_16_32 1 "nonimmediate_gr_operand") + (match_operand:VI_16_32 2 "reg_or_const_vector_operand"))) (clobber (reg:CC FLAGS_REG))] "reload_completed" [(parallel [(set (match_dup 0) - (any_logic:SI (match_dup 1) (match_dup 2))) + (any_logic:<mmxinsnmode> (match_dup 1) (match_dup 2))) (clobber (reg:CC FLAGS_REG))])] { - operands[2] = lowpart_subreg (SImode, operands[2], <MODE>mode); - operands[1] = lowpart_subreg (SImode, operands[1], <MODE>mode); - operands[0] = lowpart_subreg (SImode, operands[0], <MODE>mode); + if (!register_operand (operands[2], <MODE>mode)) + { + HOST_WIDE_INT val = ix86_convert_const_vector_to_integer (operands[2], + <MODE>mode); + operands[2] = GEN_INT (val); + } + else + operands[2] = lowpart_subreg (<mmxinsnmode>mode, operands[2], <MODE>mode); + operands[1] = lowpart_subreg (<mmxinsnmode>mode, operands[1], <MODE>mode); + operands[0] = lowpart_subreg (<mmxinsnmode>mode, operands[0], <MODE>mode); }) (define_split diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md index 42053ea7209..064596d9594 100644 --- a/gcc/config/i386/predicates.md +++ b/gcc/config/i386/predicates.md @@ -1209,6 +1209,10 @@ return trunc_int_for_mode (val, SImode) == val; }) +(define_predicate "nonimmediate_or_x86_64_const_vector_operand" + (ior (match_operand 0 "nonimmediate_operand") + (match_operand 0 "x86_64_const_vector_operand"))) + ;; Return true when OP is nonimmediate or standard SSE constant. (define_predicate "nonimmediate_or_sse_const_operand" (ior (match_operand 0 "nonimmediate_operand")
On Wed, Jul 20, 2022 at 3:18 PM Uros Bizjak <ubizjak@gmail.com> wrote: > > On Wed, Jul 20, 2022 at 8:54 AM Hongtao Liu <crazylht@gmail.com> wrote: > > > > On Wed, Jul 20, 2022 at 2:18 PM Uros Bizjak <ubizjak@gmail.com> wrote: > > > > > > On Wed, Jul 20, 2022 at 8:14 AM Uros Bizjak <ubizjak@gmail.com> wrote: > > > > > > > > On Wed, Jul 20, 2022 at 4:37 AM Hongtao Liu <crazylht@gmail.com> wrote: > > > > > > > > > > On Tue, Jul 19, 2022 at 5:37 PM Uros Bizjak <ubizjak@gmail.com> wrote: > > > > > > > > > > > > On Tue, Jul 19, 2022 at 8:56 AM Hongtao Liu <crazylht@gmail.com> wrote: > > > > > > > > > > > > > > On Tue, Jul 19, 2022 at 2:35 PM Uros Bizjak via Gcc-patches > > > > > > > <gcc-patches@gcc.gnu.org> wrote: > > > > > > > > > > > > > > > > On Tue, Jul 19, 2022 at 8:07 AM liuhongt <hongtao.liu@intel.com> wrote: > > > > > > > > > > > > > > > > > > And split it after reload. > > > > > > > > > > > > > > > > > > > You will need ix86_binary_operator_ok insn constraint here with > > > > > > > > > > corresponding expander using ix86_fixup_binary_operands_no_copy to > > > > > > > > > > prepare insn operands. > > > > > > > > > Split define_expand with just register_operand, and allow > > > > > > > > > memory/immediate in define_insn, assume combine/forwprop will do optimization. > > > > > > > > > > > > > > > > But you will *ease* the job of the above passes if you use > > > > > > > > ix86_fixup_binary_operands_no_copy in the expander. > > > > > > > for -m32, it will hit ICE in > > > > > > > Breakpoint 1, ix86_fixup_binary_operands_no_copy (code=XOR, > > > > > > > mode=E_V4QImode, operands=0x7fffffffa970) a > > > > > > > /gcc/config/i386/i386-expand.cc:1184 > > > > > > > 1184 rtx dst = ix86_fixup_binary_operands (code, mode, operands); > > > > > > > (gdb) n > > > > > > > 1185 gcc_assert (dst == operands[0]); -- here > > > > > > > (gdb) > > > > > > > > > > > > > > the original operands[0], operands[1], operands[2] are below > > > > > > > (gdb) p debug_rtx (operands[0]) > > > > > > > (mem/c:V4QI (plus:SI (reg/f:SI 77 virtual-stack-vars) > > > > > > > (const_int -8220 [0xffffffffffffdfe4])) [0 MEM <vector(4) > > > > > > > unsigned char> [(unsigned char *)&tmp2 + 4B]+0 S4 A32]) > > > > > > > $1 = void > > > > > > > (gdb) p debug_rtx (operands[1]) > > > > > > > (subreg:V4QI (reg:SI 129) 0) > > > > > > > $2 = void > > > > > > > (gdb) p debug_rtx (operands[2]) > > > > > > > (subreg:V4QI (reg:SI 98 [ _46 ]) 0) > > > > > > > $3 = void > > > > > > > (gdb) > > > > > > > > > > > > > > since operands[0] is mem and not equal to operands[1], > > > > > > > ix86_fixup_binary_operands will create a pseudo register for dst. and > > > > > > > then hit ICE. > > > > > > > Is this a bug or assumed? > > > > > > > > > > > > You will need ix86_expand_binary_operator here. > > > > > It will swap memory operand from op1 to op2 and hit ICE for unrecognized insn. > > > > > > > > > > What about this? > > > > > > > > Still no good... You are using commutative operands, so the predicate > > > > of operand 2 should also allow memory. So, the predicate should be > > > > nonimmediate_or_x86_64_const_vector_operand. The intermediate insn > > > > pattern should look something like *<any_or:code><mode>_1, but with > > > > added XMM and MMX reg alternatives instead of mask regs. > > > > > > Alternatively, you can use UNKNOWN operator to prevent > > > canonicalization, but then you should not use commutative constraint > > > in the intermediate insn. I think this is the best solution. > > Like this? > > Please check the attached (lightly tested) patch that keeps > commutative operands. Yes, it looks best, I'll fully test the patch. > > Uros.
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md index 3294c1e6274..316b83dd3ac 100644 --- a/gcc/config/i386/mmx.md +++ b/gcc/config/i386/mmx.md @@ -86,6 +86,14 @@ (define_mode_attr mmxvecsize [(V8QI "b") (V4QI "b") (V2QI "b") (V4HI "w") (V2HI "w") (V2SI "d") (V1DI "q")]) +;; Mapping to same size integral mode. +(define_mode_attr mmxinsnmode + [(V8QI "DI") (V4QI "SI") (V2QI "HI") + (V4HI "DI") (V2HI "SI") + (V2SI "DI") + (V4HF "DI") (V2HF "SI") + (V2SF "DI")]) + (define_mode_attr mmxdoublemode [(V8QI "V8HI") (V4HI "V4SI")]) @@ -350,22 +358,7 @@ (define_insn_and_split "*mov<mode>_imm" HOST_WIDE_INT val = ix86_convert_const_vector_to_integer (operands[1], <MODE>mode); operands[1] = GEN_INT (val); - machine_mode mode; - switch (GET_MODE_SIZE (<MODE>mode)) - { - case 2: - mode = HImode; - break; - case 4: - mode = SImode; - break; - case 8: - mode = DImode; - break; - default: - gcc_unreachable (); - } - operands[0] = lowpart_subreg (mode, operands[0], <MODE>mode); + operands[0] = lowpart_subreg (<mmxinsnmode>mode, operands[0], <MODE>mode); }) ;; For TARGET_64BIT we always round up to 8 bytes. @@ -2974,33 +2967,49 @@ (define_insn "*mmx_<code><mode>3" (set_attr "type" "mmxadd,sselog,sselog,sselog") (set_attr "mode" "DI,TI,TI,TI")]) -(define_insn "<code><mode>3" - [(set (match_operand:VI_16_32 0 "register_operand" "=?r,x,x,v") +(define_expand "<code><mode>3" + [(parallel + [(set (match_operand:VI_16_32 0 "register_operand") + (any_logic:VI_16_32 + (match_operand:VI_16_32 1 "register_operand") + (match_operand:VI_16_32 2 "register_operand"))) + (clobber (reg:CC FLAGS_REG))])] + "") + +(define_insn "*<code><mode>3" + [(set (match_operand:VI_16_32 0 "nonimmediate_operand" "=?r,m,x,x,v") (any_logic:VI_16_32 - (match_operand:VI_16_32 1 "register_operand" "%0,0,x,v") - (match_operand:VI_16_32 2 "register_operand" "r,x,x,v"))) + (match_operand:VI_16_32 1 "nonimmediate_operand" "%0,0,0,x,v") + (match_operand:VI_16_32 2 "register_or_x86_64_const_vector_operand" "r,i,x,x,v"))) (clobber (reg:CC FLAGS_REG))] - "" + "ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)" "#" - [(set_attr "isa" "*,sse2_noavx,avx,avx512vl") - (set_attr "type" "alu,sselog,sselog,sselog") - (set_attr "mode" "SI,TI,TI,TI")]) + [(set_attr "isa" "*,*,sse2_noavx,avx,avx512vl") + (set_attr "type" "alu,alu,sselog,sselog,sselog") + (set_attr "mode" "SI,SI,TI,TI,TI")]) (define_split - [(set (match_operand:VI_16_32 0 "general_reg_operand") + [(set (match_operand:VI_16_32 0 "nonimmediate_gr_operand") (any_logic:VI_16_32 - (match_operand:VI_16_32 1 "general_reg_operand") - (match_operand:VI_16_32 2 "general_reg_operand"))) + (match_operand:VI_16_32 1 "nonimmediate_gr_operand") + (match_operand:VI_16_32 2 "register_or_x86_64_const_vector_operand"))) (clobber (reg:CC FLAGS_REG))] "reload_completed" [(parallel [(set (match_dup 0) - (any_logic:SI (match_dup 1) (match_dup 2))) + (any_logic:<mmxinsnmode> (match_dup 1) (match_dup 2))) (clobber (reg:CC FLAGS_REG))])] { - operands[2] = lowpart_subreg (SImode, operands[2], <MODE>mode); - operands[1] = lowpart_subreg (SImode, operands[1], <MODE>mode); - operands[0] = lowpart_subreg (SImode, operands[0], <MODE>mode); + if (!register_operand (operands[2], <MODE>mode)) + { + HOST_WIDE_INT val = ix86_convert_const_vector_to_integer (operands[2], + <MODE>mode); + operands[2] = GEN_INT (val); + } + else + operands[2] = lowpart_subreg (<mmxinsnmode>mode, operands[2], <MODE>mode); + operands[1] = lowpart_subreg (<mmxinsnmode>mode, operands[1], <MODE>mode); + operands[0] = lowpart_subreg (<mmxinsnmode>mode, operands[0], <MODE>mode); }) (define_split diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md index c71c453cceb..5f63a7d52f5 100644 --- a/gcc/config/i386/predicates.md +++ b/gcc/config/i386/predicates.md @@ -1205,6 +1205,10 @@ (define_predicate "x86_64_const_vector_operand" return trunc_int_for_mode (val, SImode) == val; }) +(define_predicate "register_or_x86_64_const_vector_operand" + (ior (match_operand 0 "register_operand") + (match_operand 0 "x86_64_const_vector_operand"))) + ;; Return true when OP is nonimmediate or standard SSE constant. (define_predicate "nonimmediate_or_sse_const_operand" (ior (match_operand 0 "nonimmediate_operand") diff --git a/gcc/testsuite/gcc.target/i386/pr106038-1.c b/gcc/testsuite/gcc.target/i386/pr106038-1.c new file mode 100644 index 00000000000..bb52385c8a5 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr106038-1.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-msse2 -O2" } */ +/* { dg-final { scan-assembler-not "xmm" } } */ + +void +foo3 (char* a, char* __restrict b) +{ + a[0] &= 1; + a[1] &= 2; + a[2] &= 3; + a[3] &= 3; +} + +void +foo4 (char* a, char* __restrict b) +{ + a[0] &= 1; + a[1] &= 2; +} + + +void +foo1 (short* a, short* __restrict b) +{ + a[0] &= 1; + a[1] &= 2; +}