From patchwork Wed Dec 28 01:15:23 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roger Sayle X-Patchwork-Id: 37060 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp1662049wrt; Tue, 27 Dec 2022 17:16:06 -0800 (PST) X-Google-Smtp-Source: AMrXdXsVklZC3IZv4JPSUalINkzvw649X4lPdO6P4pE0+2z51rX3hy+D/ospDDnDGeu/44Di1AH5 X-Received: by 2002:a05:6402:33a:b0:469:ae36:b954 with SMTP id q26-20020a056402033a00b00469ae36b954mr26714197edw.30.1672190165968; Tue, 27 Dec 2022 17:16:05 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672190165; cv=none; d=google.com; s=arc-20160816; b=eh8Vhm9LTQYg+OnnC+rtiKrylov0hw8juHVMQcxuL909AM7Ouyh37zzpsxNgdXPLbt qqXv1MxZkjyYHm2GrRwjhyU34S84EaTKzxVjsBAgYatlqb9VZaif677eaXFGYHrvT9li GL1uJYYmwAnQgLAS+jFabBhqTN+uz+LZT3iXyH+SlnR9OJB4UIygfJd9ULbgeEhf4TNV InNlZABvKn9oysfj1vBucYwELeK7lA13Y5OnNhQOFQrpFJv0CIxMfWYkiMuXUt3lCsm+ YIWFLdVLiFQlhS0lU1J4oWd5pB8MorjXtc/Gp10Vxrq/tD8/525ElYxTY6ayDhAuDp3h hrRg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-language:thread-index :mime-version:message-id:date:subject:cc:to:from:dkim-signature :dmarc-filter:delivered-to; bh=4irgqsrqwGZ/DHEa8hfvESIOBf+beeHAhkMNr84BdMk=; b=l5oRLl8iNS9bN/pnhIRx2j5PuEnzVag46vF91QPNERsD0/FX6DsCZ0c/KTVsBFkHn1 RfH/Gc18cNoXpVWQAiWjctNfOf0FzEz3BgGDSXMS0CPZbbZAE26Dw+AXiB792y0u+HX3 zJqrmWRdWxKfSiEnZW0PCfnLO7bOtZl52gJ0Gb7WgAoZdwXXirX+MNBy7nrY2R+IggBY RVj0jkKr+qtYyHgrebf2TE950Xkt1Twy3Fcdobyn/jew3K4G8NzMSmn1GdDuNbHtJj+F GX5TWeOnfIUlJ4lK0FgSFtLCFkL4l+ra5oLm3/BTyc0EK95A1DAdQy2DQb1QSU3Jtto0 y1pg== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@nextmovesoftware.com header.s=default header.b=mw0n7uTG; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id h17-20020a05640250d100b0046dc5fde8adsi13856463edb.501.2022.12.27.17.16.05 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 27 Dec 2022 17:16:05 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=fail header.i=@nextmovesoftware.com header.s=default header.b=mw0n7uTG; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4C7993858416 for ; Wed, 28 Dec 2022 01:15:57 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id 10A1B3858D37 for ; Wed, 28 Dec 2022 01:15:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 10A1B3858D37 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=4irgqsrqwGZ/DHEa8hfvESIOBf+beeHAhkMNr84BdMk=; b=mw0n7uTGZrRGVFoFGmLuU+0dma vvVK2re29a9y25EPCfJJs5acr2SdeW3tuoqf1MC1QWyj/2nPRXAmiUkSwE56zeh60Rdnn6rvD/4q0 4Yura9ERs3j+cXzPJ/QXtwhoVaaGtmvoOpiBYEwsodRDajIarfP49rDuINgXYFA5ZOFB3q3E1xC7w NfY5itXEztYSfTxBG3pg6TKRj4iPzIosuTV1hUfxtbCMPnxpiGiL7ePhvnPhjImHs1XZDiH0Xplkv Kv8uNEcHl94FSU8uRpNFiay7a0BBjk5OZ+mn2DswTKZ3KluNwzMn582Wfr/WbzT66TMncmyNNgXt6 /5BDqxoQ==; Received: from host109-151-228-216.range109-151.btcentralplus.com ([109.151.228.216]:51004 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1pAL2W-0001JP-Dd; Tue, 27 Dec 2022 20:15:28 -0500 From: "Roger Sayle" To: "'GCC Patches'" Cc: "'Uros Bizjak'" Subject: [x86 PATCH] Provide zero_extend versions/variants of several patterns. Date: Wed, 28 Dec 2022 01:15:23 -0000 Message-ID: <00e801d91a59$df0f3e70$9d2dbb50$@nextmovesoftware.com> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 16.0 Thread-Index: AdkaWNa6OhAV2o1fQ3+GWeG2mDWqfw== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-12.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1753418475387842186?= X-GMAIL-MSGID: =?utf-8?q?1753418475387842186?= Back in September, the review of my patch for PR rtl-optimization/106594, https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601501.html suggested that I submit the x86 backend bits, independently and first. The executive summary is that the middle-end doesn't have a preferred canonical form for expressing zero-extension, sometimes using an AND and sometimes using zero_extend. Pending changes to RTL simplification will/may alter some of these representations, so a few additional patterns are required to recognize these alternate representations and avoid any testsuite regressions. As an example, *popcountsi2_zext is currently represented as: [(set (match_operand:DI 0 "register_operand" "=r") (and:DI (subreg:DI (popcount:SI (match_operand:SI 1 "nonimmediate_operand" "rm")) 0) (const_int 63))) (clobber (reg:CC FLAGS_REG))] this patch adds an alternate/equivalent pattern that matches: [(set (match_operand:DI 0 "register_operand" "=r") (zero_extend:DI (popcount:SI (match_operand:SI 1 "nonimmediate_operand" "rm")))) (clobber (reg:CC FLAGS_REG))] Another example is *popcounthi2 which is currently represented as: [(set (match_operand:SI 0 "register_operand") (popcount:SI (zero_extend:SI (match_operand:HI 1 "nonimmediate_operand")))) (clobber (reg:CC FLAGS_REG))] this patch adds an alternate/equivalent pattern that matches: [(set (match_operand:SI 0 "register_operand") (zero_extend:SI (popcount:HI (match_operand:HI 1 "nonimmediate_operand")))) (clobber (reg:CC FLAGS_REG))] The contents of the machine description definitions remain the same, it's just the expected RTL is slightly different but equivalent. Providing both forms makes the backend more robust to middle-end changes [and possibly catches some missed optimizations]. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32}, with no new failures. Ok for mainline? 2022-12-28 Roger Sayle gcc/ChangeLog * config/i386/i386.md (*clzsi2_lzcnt_zext_2): define_insn_and_split to match ZERO_EXTEND form of *clzsi2_lzcnt_zext. (*clzsi2_lzcnt_zext_2_falsedep): Likewise, new define_insn to match ZERO_EXTEND form of *clzsi2_lzcnt_zext_falsedep. (*bmi2_bzhi_zero_extendsidi_5): Likewise, new define_insn to match ZERO_EXTEND form of *bmi2_bzhi_zero_extendsidi. (*popcountsi2_zext_2): Likewise, new define_insn_and_split to match ZERO_EXTEND form of *popcountsi2_zext. (*popcountsi2_zext_2_falsedep): Likewise, new define_insn to match ZERO_EXTEND form of *popcountsi2_zext_falsedep. (*popcounthi2_2): Likewise, new define_insn_and_split to match ZERO_EXTEND form of *popcounthi2. (define_peephole2): ZERO_EXTEND variant of HImode popcount&1 using parity flag peephole2. Thanks in advance, Roger diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 0626752..ca40c4f 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -17419,6 +17419,42 @@ (set_attr "type" "bitmanip") (set_attr "mode" "SI")]) +(define_insn_and_split "*clzsi2_lzcnt_zext_2" + [(set (match_operand:DI 0 "register_operand" "=r") + (zero_extend:DI + (clz:SI (match_operand:SI 1 "nonimmediate_operand" "rm")))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_LZCNT && TARGET_64BIT" + "lzcnt{l}\t{%1, %k0|%k0, %1}" + "&& TARGET_AVOID_FALSE_DEP_FOR_BMI && epilogue_completed + && optimize_function_for_speed_p (cfun) + && !reg_mentioned_p (operands[0], operands[1])" + [(parallel + [(set (match_dup 0) + (zero_extend:DI (clz:SI (match_dup 1)))) + (unspec [(match_dup 0)] UNSPEC_INSN_FALSE_DEP) + (clobber (reg:CC FLAGS_REG))])] + "ix86_expand_clear (operands[0]);" + [(set_attr "prefix_rep" "1") + (set_attr "type" "bitmanip") + (set_attr "mode" "SI")]) + +; False dependency happens when destination is only updated by tzcnt, +; lzcnt or popcnt. There is no false dependency when destination is +; also used in source. +(define_insn "*clzsi2_lzcnt_zext_2_falsedep" + [(set (match_operand:DI 0 "register_operand" "=r") + (zero_extend:DI + (clz:SI (match_operand:SWI48 1 "nonimmediate_operand" "rm")))) + (unspec [(match_operand:DI 2 "register_operand" "0")] + UNSPEC_INSN_FALSE_DEP) + (clobber (reg:CC FLAGS_REG))] + "TARGET_LZCNT" + "lzcnt{l}\t{%1, %k0|%k0, %1}" + [(set_attr "prefix_rep" "1") + (set_attr "type" "bitmanip") + (set_attr "mode" "SI")]) + (define_int_iterator LT_ZCNT [(UNSPEC_TZCNT "TARGET_BMI") (UNSPEC_LZCNT "TARGET_LZCNT")]) @@ -17737,6 +17773,22 @@ (set_attr "prefix" "vex") (set_attr "mode" "DI")]) +(define_insn "*bmi2_bzhi_zero_extendsidi_5" + [(set (match_operand:DI 0 "register_operand" "=r") + (and:DI + (zero_extend:DI + (plus:SI + (ashift:SI (const_int 1) + (match_operand:QI 2 "register_operand" "r")) + (const_int -1))) + (match_operand:DI 1 "nonimmediate_operand" "rm"))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_64BIT && TARGET_BMI2" + "bzhi\t{%q2, %q1, %q0|%q0, %q1, %q2}" + [(set_attr "type" "bitmanip") + (set_attr "prefix" "vex") + (set_attr "mode" "DI")]) + (define_insn "bmi2_pdep_3" [(set (match_operand:SWI48 0 "register_operand" "=r") (unspec:SWI48 [(match_operand:SWI48 1 "register_operand" "r") @@ -17999,6 +18051,54 @@ (set_attr "type" "bitmanip") (set_attr "mode" "SI")]) +(define_insn_and_split "*popcountsi2_zext_2" + [(set (match_operand:DI 0 "register_operand" "=r") + (zero_extend:DI + (popcount:SI (match_operand:SI 1 "nonimmediate_operand" "rm")))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_POPCNT && TARGET_64BIT" +{ +#if TARGET_MACHO + return "popcnt\t{%1, %k0|%k0, %1}"; +#else + return "popcnt{l}\t{%1, %k0|%k0, %1}"; +#endif +} + "&& TARGET_AVOID_FALSE_DEP_FOR_BMI && epilogue_completed + && optimize_function_for_speed_p (cfun) + && !reg_mentioned_p (operands[0], operands[1])" + [(parallel + [(set (match_dup 0) + (zero_extend:DI (popcount:SI (match_dup 1)))) + (unspec [(match_dup 0)] UNSPEC_INSN_FALSE_DEP) + (clobber (reg:CC FLAGS_REG))])] + "ix86_expand_clear (operands[0]);" + [(set_attr "prefix_rep" "1") + (set_attr "type" "bitmanip") + (set_attr "mode" "SI")]) + +; False dependency happens when destination is only updated by tzcnt, +; lzcnt or popcnt. There is no false dependency when destination is +; also used in source. +(define_insn "*popcountsi2_zext_2_falsedep" + [(set (match_operand:DI 0 "register_operand" "=r") + (zero_extend:DI + (popcount:SI (match_operand:SI 1 "nonimmediate_operand" "rm")))) + (unspec [(match_operand:DI 2 "register_operand" "0")] + UNSPEC_INSN_FALSE_DEP) + (clobber (reg:CC FLAGS_REG))] + "TARGET_POPCNT && TARGET_64BIT" +{ +#if TARGET_MACHO + return "popcnt\t{%1, %k0|%k0, %1}"; +#else + return "popcnt{l}\t{%1, %k0|%k0, %1}"; +#endif +} + [(set_attr "prefix_rep" "1") + (set_attr "type" "bitmanip") + (set_attr "mode" "SI")]) + (define_insn_and_split "*popcounthi2_1" [(set (match_operand:SI 0 "register_operand") (popcount:SI @@ -18017,6 +18117,24 @@ DONE; }) +(define_insn_and_split "*popcounthi2_2" + [(set (match_operand:SI 0 "register_operand") + (zero_extend:SI + (popcount:HI (match_operand:HI 1 "nonimmediate_operand")))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_POPCNT + && ix86_pre_reload_split ()" + "#" + "&& 1" + [(const_int 0)] +{ + rtx tmp = gen_reg_rtx (HImode); + + emit_insn (gen_popcounthi2 (tmp, operands[1])); + emit_insn (gen_zero_extendhisi2 (operands[0], tmp)); + DONE; +}) + (define_insn "popcounthi2" [(set (match_operand:HI 0 "register_operand" "=r") (popcount:HI @@ -18336,6 +18454,39 @@ PUT_CODE (operands[5], GET_CODE (operands[5]) == EQ ? UNORDERED : ORDERED); }) +;; Eliminate HImode popcount&1 using parity flag (variant 2) +(define_peephole2 + [(match_scratch:HI 0 "Q") + (parallel [(set (match_operand:HI 1 "register_operand") + (popcount:HI + (match_operand:HI 2 "nonimmediate_operand"))) + (clobber (reg:CC FLAGS_REG))]) + (set (reg:CCZ FLAGS_REG) + (compare:CCZ (and:QI (match_operand:QI 3 "register_operand") + (const_int 1)) + (const_int 0))) + (set (pc) (if_then_else (match_operator 4 "bt_comparison_operator" + [(reg:CCZ FLAGS_REG) + (const_int 0)]) + (label_ref (match_operand 5)) + (pc)))] + "REGNO (operands[1]) == REGNO (operands[3]) + && peep2_reg_dead_p (2, operands[1]) + && peep2_reg_dead_p (2, operands[3]) + && peep2_regno_dead_p (3, FLAGS_REG)" + [(set (match_dup 0) (match_dup 2)) + (parallel [(set (reg:CC FLAGS_REG) + (unspec:CC [(match_dup 0)] UNSPEC_PARITY)) + (clobber (match_dup 0))]) + (set (pc) (if_then_else (match_op_dup 4 [(reg:CC FLAGS_REG) + (const_int 0)]) + (label_ref (match_dup 5)) + (pc)))] +{ + operands[4] = shallow_copy_rtx (operands[4]); + PUT_CODE (operands[4], GET_CODE (operands[4]) == EQ ? UNORDERED : ORDERED); +}) + ;; Thread-local storage patterns for ELF. ;;