From patchwork Tue Oct 11 08:03:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1908 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4ac7:0:0:0:0:0 with SMTP id y7csp1969827wrs; Tue, 11 Oct 2022 01:04:44 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6Zly0vSEoKw3w5tZdjctA4aITb3FtqB87YnNg5YgDKZf2xjQu9Wc2GE4qu5yNgJfHNggCJ X-Received: by 2002:a05:6402:2743:b0:459:1914:493d with SMTP id z3-20020a056402274300b004591914493dmr21794580edd.361.1665475484074; Tue, 11 Oct 2022 01:04:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1665475484; cv=none; d=google.com; s=arc-20160816; b=sNspce7j7JFBYH+9vbGq3YYtppTJ/16/X2C5tUr00aMKIpELmn/crcYcKVzNb1vz9e MD0Aud0P2LBqSVR4IwJwPbL+r0O+7fJ9BrZT8BWSuLHoozlfnfsWpJbqmKovLMdCuZx6 yTsxmmSuD+sGCA3xyb6RBL+5m7qXEGtiIC/YeghjqWLOa9WsIPte7oCPqUTAV0rG8hz3 Zcq/ar/NjAe4nXSJ3TX21PrYmw2CsqX1jfU2voO+wKuq3jaWV5LtHYOXrkbrQ2pR9W4G aOUnZXyI85K+Jnq5DP8uFhDQh35nB8p0iyMAqtpqtDXCg3SNEQT68o36Ikdl1gnbwisS Cu/g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence :content-transfer-encoding:mime-version:message-id:date:subject:to :dmarc-filter:delivered-to:dkim-signature:dkim-filter; bh=i7CfmYrq0DSiTh0A3mTQSFG/qRkGv65m6drpO63s4ik=; b=leJCQsDXqVE598yD3997EnGaAg/yhvPG4e+7O22XtRZdoNUb5/57YTDNOKU9HlALrB pbIC/rhKhOlGoSqY0DIumBZ52FF+xK9kwPnZFz6pgJ5vaGIoX5Q4gtAsCM6Tlob3Ie4Q lwAwPTcqNqhYfJCAQS65cwAVr+9ARzrUtodNBkVG9l5b/f24vqxVVj7xFOgquw63mQ1F 6CWZKAvJhIJuIvVcTdBs5ig3D0rHbTMo1B3uAs8yuzwMUuK8bxA/iKEeiicEQonxerVc k8kOBThXrw0yP4JVVmrVA43Sy/FObKL51QxCFGvhb64e4HmUXcGcwlW6Cj14XG67naqZ x6yw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=Y4EYu3T4; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id g14-20020a056402090e00b004585f904a4esi15364954edz.360.2022.10.11.01.04.43 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Oct 2022 01:04:44 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=Y4EYu3T4; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 66B4C3857379 for ; Tue, 11 Oct 2022 08:04:42 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 66B4C3857379 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1665475482; bh=i7CfmYrq0DSiTh0A3mTQSFG/qRkGv65m6drpO63s4ik=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=Y4EYu3T4qkBfAA71XSC6ZB28BQlsBnGZR3D3kEK1dV49V3wADCp4t8Q1yANSwllWn Y8grgwX8SfJv3y3hrqNUuP3S8yNK/EU2OSyacJrbB6xsjK8RiOc3NzHb26nJDX5Z7N xvWpVkRR5zIepYqgZOBFCBO5Ksr9lZfpSU6I0Ft8= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by sourceware.org (Postfix) with ESMTPS id E83163858C2D for ; Tue, 11 Oct 2022 08:03:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E83163858C2D X-IronPort-AV: E=McAfee;i="6500,9779,10496"; a="306062411" X-IronPort-AV: E=Sophos;i="5.95,175,1661842800"; d="scan'208";a="306062411" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Oct 2022 01:03:19 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10496"; a="730883098" X-IronPort-AV: E=Sophos;i="5.95,175,1661842800"; d="scan'208";a="730883098" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by fmsmga002.fm.intel.com with ESMTP; 11 Oct 2022 01:03:17 -0700 Received: from shliclel4051.sh.intel.com (shliclel4051.sh.intel.com [10.239.240.51]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 13E67100AC43; Tue, 11 Oct 2022 16:03:16 +0800 (CST) To: gcc-patches@gcc.gnu.org Subject: [PATCH] [x86] Add define_insn_and_split to support general version of "kxnor". Date: Tue, 11 Oct 2022 16:03:16 +0800 Message-Id: <20221011080316.1778261-1-hongtao.liu@intel.com> X-Mailer: git-send-email 2.27.0 MIME-Version: 1.0 X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1746377621125517379?= X-GMAIL-MSGID: =?utf-8?q?1746377621125517379?= For genereal_reg_operand, it will be splitted into xor + not. For mask_reg_operand, it will be splitted with UNSPEC_MASK_OP just like what we did for other logic operations. The patch will optimize xor+not to kxnor when possible. Bootstrapped and regtested on x86_64-pc-linux-gnu. Ok for trunk? gcc/ChangeLog: * config/i386/i386.md (*notxor_1): New post_reload define_insn_and_split. (*notxorqi_1): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/pr107093.c: New test. --- gcc/config/i386/i386.md | 71 ++++++++++++++++++++++++ gcc/testsuite/gcc.target/i386/pr107093.c | 38 +++++++++++++ 2 files changed, 109 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/pr107093.c diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 1be9b669909..228edba2b40 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -10826,6 +10826,39 @@ (define_insn "*_1" (set_attr "type" "alu, alu, msklog") (set_attr "mode" "")]) +(define_insn_and_split "*notxor_1" + [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm,r,?k") + (not:SWI248 + (xor:SWI248 + (match_operand:SWI248 1 "nonimmediate_operand" "%0,0,k") + (match_operand:SWI248 2 "" "r,,k")))) + (clobber (reg:CC FLAGS_REG))] + "ix86_binary_operator_ok (XOR, mode, operands)" + "#" + "&& reload_completed" + [(parallel + [(set (match_dup 0) + (xor:SWI248 (match_dup 1) (match_dup 2))) + (clobber (reg:CC FLAGS_REG))]) + (set (match_dup 0) + (not:SWI248 (match_dup 1)))] +{ + if (MASK_REGNO_P (REGNO (operands[0]))) + { + emit_insn (gen_kxnor (operands[0], operands[1], operands[2])); + DONE; + } +} + [(set (attr "isa") + (cond [(eq_attr "alternative" "2") + (if_then_else (eq_attr "mode" "SI,DI") + (const_string "avx512bw") + (const_string "avx512f")) + ] + (const_string "*"))) + (set_attr "type" "alu, alu, msklog") + (set_attr "mode" "")]) + (define_insn_and_split "*iordi_1_bts" [(set (match_operand:DI 0 "nonimmediate_operand" "=rm") (ior:DI @@ -10959,6 +10992,44 @@ (define_insn "*qi_1" (symbol_ref "!TARGET_PARTIAL_REG_STALL")] (symbol_ref "true")))]) +(define_insn_and_split "*notxorqi_1" + [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,q,r,?k") + (not:QI + (xor:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,0,k") + (match_operand:QI 2 "general_operand" "qn,m,rn,k")))) + (clobber (reg:CC FLAGS_REG))] + "ix86_binary_operator_ok (XOR, QImode, operands)" + "#" + "&& reload_completed" + [(parallel + [(set (match_dup 0) + (xor:QI (match_dup 1) (match_dup 2))) + (clobber (reg:CC FLAGS_REG))]) + (set (match_dup 0) + (not:QI (match_dup 0)))] +{ + if (mask_reg_operand (operands[0], QImode)) + { + emit_insn (gen_kxnorqi (operands[0], operands[1], operands[2])); + DONE; + } +} + [(set_attr "isa" "*,*,*,avx512f") + (set_attr "type" "alu,alu,alu,msklog") + (set (attr "mode") + (cond [(eq_attr "alternative" "2") + (const_string "SI") + (and (eq_attr "alternative" "3") + (match_test "!TARGET_AVX512DQ")) + (const_string "HI") + ] + (const_string "QI"))) + ;; Potential partial reg stall on alternative 2. + (set (attr "preferred_for_speed") + (cond [(eq_attr "alternative" "2") + (symbol_ref "!TARGET_PARTIAL_REG_STALL")] + (symbol_ref "true")))]) + ;; Alternative 1 is needed to work around LRA limitation, see PR82524. (define_insn_and_split "*_1_slp" [(set (strict_low_part (match_operand:SWI12 0 "register_operand" "+,&")) diff --git a/gcc/testsuite/gcc.target/i386/pr107093.c b/gcc/testsuite/gcc.target/i386/pr107093.c new file mode 100644 index 00000000000..23e30cbac0f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr107093.c @@ -0,0 +1,38 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512bw -O2 -mavx512vl" } */ +/* { dg-final { scan-assembler-times {(?n)kxnor[bwqd]} 4 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times {(?n)kxnor[bwdq]} 3 { target ia32 } } } */ + +#include + +__m512i +foo (__m512i a, __m512i b, __m512i c, __m512i d) +{ + __mmask32 k1 = _mm512_cmp_epi16_mask (a, b, 1); + __mmask32 k2 = _mm512_cmp_epi16_mask (c, d, 2); + return _mm512_mask_mov_epi16 (a, ~(k1 ^ k2), c); +} + +__m512i +foo1 (__m512i a, __m512i b, __m512i c, __m512i d) +{ + __mmask16 k1 = _mm512_cmp_epi32_mask (a, b, 1); + __mmask16 k2 = _mm512_cmp_epi32_mask (c, d, 2); + return _mm512_mask_mov_epi32 (a, ~(k1 ^ k2), c); +} + +__m512i +foo2 (__m512i a, __m512i b, __m512i c, __m512i d) +{ + __mmask64 k1 = _mm512_cmp_epi8_mask (a, b, 1); + __mmask64 k2 = _mm512_cmp_epi8_mask (c, d, 2); + return _mm512_mask_mov_epi8 (a, ~(k1 ^ k2), c); +} + +__m512i +foo3 (__m512i a, __m512i b, __m512i c, __m512i d) +{ + __mmask8 k1 = _mm512_cmp_epi64_mask (a, b, 1); + __mmask8 k2 = _mm512_cmp_epi64_mask (c, d, 2); + return _mm512_mask_mov_epi64 (a, ~(k1 ^ k2), c); +}