From patchwork Mon Oct 31 01:23:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 13093 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp2048133wru; Sun, 30 Oct 2022 18:26:06 -0700 (PDT) X-Google-Smtp-Source: AMsMyM79gU+ozarLKGlIfn7SkV2keIfW66iUnhzhyIKGJ/uIc2KAIkQNc1eMLmjYTrM9RS920ngB X-Received: by 2002:a05:6402:190f:b0:461:bd53:27c4 with SMTP id e15-20020a056402190f00b00461bd5327c4mr10981501edz.75.1667179566907; Sun, 30 Oct 2022 18:26:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667179566; cv=none; d=google.com; s=arc-20160816; b=ou+6mNH9ZmK22O2Moop2TD7oh0tbKHrV2kt+8PxvjPibeJBqxIo3zBG5rvnwHxJrKN JpzND+7dFCKT3ndzAWo0jIy4FNh4S3vY6L7Weuw7Vd78qwQAqyqAViaDhS1QFUJeJXvb bWOcZ5KyXYejlXYfZjnKiw+v9lek4Zklq3b5FqBCKYFi+B9L0fq6ArcGTrISANTcOvjU tPnOz849zWxk5GB1fxS8cATJ2HgXQeJ7NNkOK0tCuxRKHtFgfJiQGbVUscMujeMOGuBq FeUOlOljChWiJJJvosh/T+ZdvJCrE99+kWMGaymXP7f+jnLFKVebL3BWwTbmx1tQ4E+7 OG1Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence :content-transfer-encoding:mime-version:message-id:date:subject:to :dmarc-filter:delivered-to:dkim-signature:dkim-filter; bh=XxTHAKWkl3LuvFrxmSABs2xqY2SH8wpNotLIJnfPzLM=; b=aO2EZt1O4iWIPfysa5tWjTSaIg/Qus01XHForqflv7NoS19H6aBg7axXh5Z7luZXJw x8HlJsbcUdvelEp0gvC0+jOzvMpOdX2iSW8a8iE0oOMFCu1d5UK0MqgV1jN0V3Ar3h6L tu7VHB93FWtjKFKe1X3+7u0u3gjBlMsuYePZcH65AUqKWHrcwVBgHm/2P7HFQE3bFVrl 4Vvimg7AgUhKXgNiWoqkklwSqDZpKfoZm9Hu/24gCo/Zzo1tsktP3yVtCPTIYfIeU6Cc bh1yEk0nI4s/Ye/h+7IKoT1mIQIuV7d+XrZH2A9nfKArkyNsvAUAPaGt1732MgAH0I4r Ik/Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=yfAgkm8O; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id g16-20020a1709065d1000b00791bcf8aa5esi3281638ejt.739.2022.10.30.18.26.06 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 30 Oct 2022 18:26:06 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=yfAgkm8O; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id AF6AE3858CDA for ; Mon, 31 Oct 2022 01:26:05 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org AF6AE3858CDA DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1667179565; bh=XxTHAKWkl3LuvFrxmSABs2xqY2SH8wpNotLIJnfPzLM=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=yfAgkm8O+uIZoyUlqmzrENpC2vENwdKxVpOYIJKfRJWkn16hAEpxsJaGDOnMxwLAX /KGE86a5TBvsT6ed9Er++qSQnxKRqHSY3hlPRE0Lwgx1rVp6nZFhZG8MRY9nAlUOy6 /RZekIMPLrHwsxSio6zdSE99opxTeBM9sUb+mTJI= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by sourceware.org (Postfix) with ESMTPS id 1FC283858C1F for ; Mon, 31 Oct 2022 01:25:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 1FC283858C1F X-IronPort-AV: E=McAfee;i="6500,9779,10516"; a="373003332" X-IronPort-AV: E=Sophos;i="5.95,227,1661842800"; d="scan'208";a="373003332" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2022 18:25:13 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10516"; a="696830802" X-IronPort-AV: E=Sophos;i="5.95,227,1661842800"; d="scan'208";a="696830802" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by fmsmga008.fm.intel.com with ESMTP; 30 Oct 2022 18:25:11 -0700 Received: from shliclel4051.sh.intel.com (shliclel4051.sh.intel.com [10.239.240.51]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 8838910056B8; Mon, 31 Oct 2022 09:25:10 +0800 (CST) To: gcc-patches@gcc.gnu.org Subject: [PATCH] Enable more optimization for 32-bit/64-bit shrd/shld with imm shift count. Date: Mon, 31 Oct 2022 09:23:10 +0800 Message-Id: <20221031012310.1237451-1-hongtao.liu@intel.com> X-Mailer: git-send-email 2.27.0 MIME-Version: 1.0 X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748164481232778519?= X-GMAIL-MSGID: =?utf-8?q?1748164481232778519?= This patch doens't handle variable count since it require 5 insns to be combined to get wanted pattern, but current pass_combine only supports at most 4. This patch doesn't handle 16-bit shrd/shld either. Ideally, we can avoid redundancy of *x86_64_shld_shrd_1_nozext/*x86_shld_shrd_1_nozext if middle end could recognize they're just variants of the *x86_64_shrd_shld_1_nozext/*x86_shrd_shld_1_nozext with ashift/lshiftrt swapped in the ior which is commutative. But currently it doesn't, so I add both of them in the patch. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ok for trunk? gcc/ChangeLog: PR target/55583 * config/i386/i386.md (*x86_64_shld_1): Rename to .. (x86_64_shld_1): .. this. (*x86_shld_1): Rename to .. (x86_shld_1): .. this. (*x86_64_shrd_1): Rename to .. (x86_64_shrd_1): .. this. (*x86_shrd_1): Rename to .. (x86_shrd_1): .. this. (*x86_64_shld_shrd_1_nozext): New pre_reload splitter. (*x86_shld_shrd_1_nozext): Ditto. (*x86_64_shrd_shld_1_nozext): Ditto. (*x86_shrd_shld_1_nozext): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/pr55583.c: New test. --- gcc/config/i386/i386.md | 150 +++++++++++++++++++++++- gcc/testsuite/gcc.target/i386/pr55583.c | 27 +++++ 2 files changed, 173 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr55583.c diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index baf1f1f8fa2..a3ac319f0d7 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -12470,7 +12470,7 @@ (define_insn "x86_64_shld" (set_attr "amdfam10_decode" "vector") (set_attr "bdver1_decode" "vector")]) -(define_insn "*x86_64_shld_1" +(define_insn "x86_64_shld_1" [(set (match_operand:DI 0 "nonimmediate_operand" "+r*m") (ior:DI (ashift:DI (match_dup 0) (match_operand:QI 2 "const_0_to_63_operand")) @@ -12491,6 +12491,42 @@ (define_insn "*x86_64_shld_1" (set_attr "amdfam10_decode" "vector") (set_attr "bdver1_decode" "vector")]) +(define_insn_and_split "*x86_64_shld_shrd_1_nozext" + [(set (match_operand:DI 0 "nonimmediate_operand") + (ior:DI (ashift:DI (match_operand:DI 4 "nonimmediate_operand") + (match_operand:QI 2 "const_0_to_63_operand")) + (lshiftrt:DI + (match_operand:DI 1 "nonimmediate_operand") + (match_operand:QI 3 "const_0_to_63_operand")))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_64BIT + && INTVAL (operands[3]) == 64 - INTVAL (operands[2]) + && ix86_pre_reload_split ()" + "#" + "&& 1" + [(const_int 0)] +{ + if (rtx_equal_p (operands[4], operands[0])) + { + operands[1] = force_reg (DImode, operands[1]); + emit_insn (gen_x86_64_shld_1 (operands[0], operands[1], operands[2], operands[3])); + } + else if (rtx_equal_p (operands[1], operands[0])) + { + operands[4] = force_reg (DImode, operands[4]); + emit_insn (gen_x86_64_shrd_1 (operands[0], operands[4], operands[3], operands[2])); + } + else + { + operands[1] = force_reg (DImode, operands[1]); + rtx tmp = gen_reg_rtx (DImode); + emit_move_insn (tmp, operands[4]); + emit_insn (gen_x86_64_shld_1 (tmp, operands[1], operands[2], operands[3])); + emit_move_insn (operands[0], tmp); + } + DONE; +}) + (define_insn_and_split "*x86_64_shld_2" [(set (match_operand:DI 0 "nonimmediate_operand") (ior:DI (ashift:DI (match_dup 0) @@ -12534,7 +12570,7 @@ (define_insn "x86_shld" (set_attr "amdfam10_decode" "vector") (set_attr "bdver1_decode" "vector")]) -(define_insn "*x86_shld_1" +(define_insn "x86_shld_1" [(set (match_operand:SI 0 "nonimmediate_operand" "+r*m") (ior:SI (ashift:SI (match_dup 0) (match_operand:QI 2 "const_0_to_31_operand")) @@ -12555,6 +12591,41 @@ (define_insn "*x86_shld_1" (set_attr "amdfam10_decode" "vector") (set_attr "bdver1_decode" "vector")]) +(define_insn_and_split "*x86_shld_shrd_1_nozext" + [(set (match_operand:SI 0 "nonimmediate_operand") + (ior:SI (ashift:SI (match_operand:SI 4 "nonimmediate_operand") + (match_operand:QI 2 "const_0_to_31_operand")) + (lshiftrt:SI + (match_operand:SI 1 "nonimmediate_operand") + (match_operand:QI 3 "const_0_to_31_operand")))) + (clobber (reg:CC FLAGS_REG))] + "INTVAL (operands[3]) == 32 - INTVAL (operands[2]) + && ix86_pre_reload_split ()" + "#" + "&& 1" + [(const_int 0)] +{ + if (rtx_equal_p (operands[4], operands[0])) + { + operands[1] = force_reg (SImode, operands[1]); + emit_insn (gen_x86_shld_1 (operands[0], operands[1], operands[2], operands[3])); + } + else if (rtx_equal_p (operands[1], operands[0])) + { + operands[4] = force_reg (SImode, operands[4]); + emit_insn (gen_x86_shrd_1 (operands[0], operands[4], operands[3], operands[2])); + } + else + { + operands[1] = force_reg (SImode, operands[1]); + rtx tmp = gen_reg_rtx (SImode); + emit_move_insn (tmp, operands[4]); + emit_insn (gen_x86_shld_1 (tmp, operands[1], operands[2], operands[3])); + emit_move_insn (operands[0], tmp); + } + DONE; +}) + (define_insn_and_split "*x86_shld_2" [(set (match_operand:SI 0 "nonimmediate_operand") (ior:SI (ashift:SI (match_dup 0) @@ -13433,7 +13504,7 @@ (define_insn "x86_64_shrd" (set_attr "amdfam10_decode" "vector") (set_attr "bdver1_decode" "vector")]) -(define_insn "*x86_64_shrd_1" +(define_insn "x86_64_shrd_1" [(set (match_operand:DI 0 "nonimmediate_operand" "+r*m") (ior:DI (lshiftrt:DI (match_dup 0) (match_operand:QI 2 "const_0_to_63_operand")) @@ -13454,6 +13525,42 @@ (define_insn "*x86_64_shrd_1" (set_attr "amdfam10_decode" "vector") (set_attr "bdver1_decode" "vector")]) +(define_insn_and_split "*x86_64_shrd_shld_1_nozext" + [(set (match_operand:DI 0 "nonimmediate_operand") + (ior:DI (lshiftrt:DI (match_operand:DI 4 "nonimmediate_operand") + (match_operand:QI 2 "const_0_to_63_operand")) + (ashift:DI + (match_operand:DI 1 "nonimmediate_operand") + (match_operand:QI 3 "const_0_to_63_operand")))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_64BIT + && INTVAL (operands[3]) == 64 - INTVAL (operands[2]) + && ix86_pre_reload_split ()" + "#" + "&& 1" + [(const_int 0)] +{ + if (rtx_equal_p (operands[4], operands[0])) + { + operands[1] = force_reg (DImode, operands[1]); + emit_insn (gen_x86_64_shrd_1 (operands[0], operands[1], operands[2], operands[3])); + } + else if (rtx_equal_p (operands[1], operands[0])) + { + operands[4] = force_reg (DImode, operands[4]); + emit_insn (gen_x86_64_shld_1 (operands[0], operands[4], operands[3], operands[2])); + } + else + { + operands[1] = force_reg (DImode, operands[1]); + rtx tmp = gen_reg_rtx (DImode); + emit_move_insn (tmp, operands[4]); + emit_insn (gen_x86_64_shrd_1 (tmp, operands[1], operands[2], operands[3])); + emit_move_insn (operands[0], tmp); + } + DONE; +}) + (define_insn_and_split "*x86_64_shrd_2" [(set (match_operand:DI 0 "nonimmediate_operand") (ior:DI (lshiftrt:DI (match_dup 0) @@ -13497,7 +13604,7 @@ (define_insn "x86_shrd" (set_attr "amdfam10_decode" "vector") (set_attr "bdver1_decode" "vector")]) -(define_insn "*x86_shrd_1" +(define_insn "x86_shrd_1" [(set (match_operand:SI 0 "nonimmediate_operand" "+r*m") (ior:SI (lshiftrt:SI (match_dup 0) (match_operand:QI 2 "const_0_to_31_operand")) @@ -13518,6 +13625,41 @@ (define_insn "*x86_shrd_1" (set_attr "amdfam10_decode" "vector") (set_attr "bdver1_decode" "vector")]) +(define_insn_and_split "*x86_shrd_shld_1_nozext" + [(set (match_operand:SI 0 "nonimmediate_operand") + (ior:SI (lshiftrt:SI (match_operand:SI 4 "nonimmediate_operand") + (match_operand:QI 2 "const_0_to_31_operand")) + (ashift:SI + (match_operand:SI 1 "nonimmediate_operand") + (match_operand:QI 3 "const_0_to_31_operand")))) + (clobber (reg:CC FLAGS_REG))] + "INTVAL (operands[3]) == 32 - INTVAL (operands[2]) + && ix86_pre_reload_split ()" + "#" + "&& 1" + [(const_int 0)] +{ + if (rtx_equal_p (operands[4], operands[0])) + { + operands[1] = force_reg (SImode, operands[1]); + emit_insn (gen_x86_shrd_1 (operands[0], operands[1], operands[2], operands[3])); + } + else if (rtx_equal_p (operands[1], operands[0])) + { + operands[4] = force_reg (SImode, operands[4]); + emit_insn (gen_x86_shld_1 (operands[0], operands[4], operands[3], operands[2])); + } + else + { + operands[1] = force_reg (SImode, operands[1]); + rtx tmp = gen_reg_rtx (SImode); + emit_move_insn (tmp, operands[4]); + emit_insn (gen_x86_shrd_1 (tmp, operands[1], operands[2], operands[3])); + emit_move_insn (operands[0], tmp); + } + DONE; +}) + (define_insn_and_split "*x86_shrd_2" [(set (match_operand:SI 0 "nonimmediate_operand") (ior:SI (lshiftrt:SI (match_dup 0) diff --git a/gcc/testsuite/gcc.target/i386/pr55583.c b/gcc/testsuite/gcc.target/i386/pr55583.c new file mode 100644 index 00000000000..1c128b5d929 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr55583.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -Wno-shift-count-overflow" } */ +/* { dg-final { scan-assembler-times {(?n)shrd[ql]?[\t ]*\$2} 4 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times {(?n)shrdl?[\t ]*\$2} 2 { target ia32 } } } */ +/* { dg-final { scan-assembler-times {(?n)shldl?[\t ]*\$2} 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-times {(?n)shld[ql]?[\t ]*\$2} 2 { target { ! ia32 } } } } */ + +typedef unsigned long u64; +typedef unsigned int u32; +typedef unsigned short u16; + +long a, b; +int c, d; +short e, f; +const int n = 2; + +void test64r () { b = ((u64)b >> n) | (a << (64 - n)); } +void test32r () { d = ((u32)d >> n) | (c << (32 - n)); } + +unsigned long ua, ub; +unsigned int uc, ud; +unsigned short ue, uf; + +void testu64l () { ub = (ub << n) | (ua >> (64 - n)); } +void testu64r () { ub = (ub >> n) | (ua << (64 - n)); } +void testu32l () { ud = (ud << n) | (uc >> (32 - n)); } +void testu32r () { ud = (ud >> n) | (uc << (32 - n)); }