From patchwork Wed Sep 6 10:46:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xi Ruoyao X-Patchwork-Id: 137571 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:ab0a:0:b0:3f2:4152:657d with SMTP id m10csp2225803vqo; Wed, 6 Sep 2023 03:48:29 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGzK3uaBajZtJfxUiwDPV3FSbp4x1VtMYRe3jlHG8aBDUr7dNoiPN4+V0UOCYohTQ5aRugO X-Received: by 2002:a05:6402:1246:b0:51e:5251:8f45 with SMTP id l6-20020a056402124600b0051e52518f45mr2016037edw.4.1693997309736; Wed, 06 Sep 2023 03:48:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1693997309; cv=none; d=google.com; s=arc-20160816; b=ctNgfa0q5iEUOTAuc2mOnU7687o/6LoxJ7CfASpZ98g+8K0p9MD7OC0TXNKuYSxEOZ j6LP+XLF0NWwWPZRV7gHmUzUv7VoArAG38QPwKhDsglZKr2WgdGMaz9WFPaNbIauh+7t nVwYeFxPB0mdElucQeyQMupqD3755jnWSa92WkC8I8cxrb4yBcEEEfAymaHhymJcWpIG pvkX2c+5dn1jHlK+ToLN6yQU945xtRyUhynyO5XD4a/THTPDdxFYKyy7M30WKcw+S3Yk VV+hu2nzFUlTkCP02nnoz9jJqrXE51YADV02zM4V7GOfOnSHIAW4qfY5alB8SyRwWs37 wF5g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:reply-to:from:list-subscribe:list-help :list-post:list-archive:list-unsubscribe:list-id:precedence :content-transfer-encoding:mime-version:message-id:date:subject:to :dmarc-filter:delivered-to:dkim-signature:dkim-filter; bh=Ms95cMk6GCKXNsjHwLVDQo0xjA3QPsbIcQj5PSgk/Lg=; fh=gqFPJsL70nGMJTx7c6ZrQjO5osBKqngx39yJ1JCrYhM=; b=pVh8zRNRIivn0gk4mTfdU+3qwr9yl9MsxNq+HG6sc2vly0KD+8KpBp6AGxL2C2dlKv ReeBgKVTmNxKM3OedTTDZUMb3v5oG20H1lOusJLgVRWLrObDJcimk7tAUDm9eWJWgg8C nxRtjnu+ZWvt1atKhebJnsl3CoAhJh80LCJfU6abg0fQ2CYU0CRSGWeXtBfaPH+FanFk 817GXfrRizJc6Pcc7MxM1iP78dJoh3zcnkugCrZorQt9TCewMXFhdFmaVKqjFn4xBY6Y sUtBVB0+Q37xgpcjBCPs2Hvw/QR9F/gdp1d5VDalHwCAxdabTzaLkPM/oJ2CpOGddOkN sEzg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=YZcLBuvh; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id t8-20020aa7d4c8000000b005233f1e68c0si9200698edr.177.2023.09.06.03.48.29 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Sep 2023 03:48:29 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=YZcLBuvh; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3F1313856962 for ; Wed, 6 Sep 2023 10:47:43 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3F1313856962 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1693997263; bh=Ms95cMk6GCKXNsjHwLVDQo0xjA3QPsbIcQj5PSgk/Lg=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=YZcLBuvh6upeMNj9kWh6THqZ9GeTp2+IwMR/AETKEWYNg506gwd5aIPe1JZG8m78x KOx9jKlmJrw7W5KSbUpx3hJIc+7Wezu83R6Ac8jBSV/FBlvKT6VJkBkPeebU+5gQS5 ZKWmjaSfIc6+HNrDhU4UYxju4zP8tN233G9jBkzc= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from xry111.site (xry111.site [89.208.246.23]) by sourceware.org (Postfix) with ESMTPS id 626B5385DC33 for ; Wed, 6 Sep 2023 10:46:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 626B5385DC33 Received: from stargazer.. (unknown [IPv6:240e:456:1030:853d:d470:d3aa:536:c469]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (Client did not present a certificate) (Authenticated sender: xry111@xry111.site) by xry111.site (Postfix) with ESMTPSA id 315966599B; Wed, 6 Sep 2023 06:46:46 -0400 (EDT) To: gcc-patches@gcc.gnu.org Subject: [PATCH] LoongArch: Use bstrins instruction for (a & ~mask) and (a & mask) | (b & ~mask) [PR111252] Date: Wed, 6 Sep 2023 18:46:28 +0800 Message-ID: <20230906104628.51362-1-xry111@xry111.site> X-Mailer: git-send-email 2.42.0 MIME-Version: 1.0 X-Spam-Status: No, score=-8.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_STOCKGEN, LIKELY_SPAM_FROM, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Xi Ruoyao via Gcc-patches From: Xi Ruoyao Reply-To: Xi Ruoyao Cc: chenxiaolong , xuchenghua@loongson.cn, chenglulu , i@xen0n.name Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1776284923147931991 X-GMAIL-MSGID: 1776284923147931991 If mask is a constant with value ((1 << N) - 1) << M we can perform this optimization. gcc/ChangeLog: PR target/111252 * config/loongarch/loongarch-protos.h (loongarch_pre_reload_split): Declare new function. (loongarch_use_bstrins_for_ior_with_mask): Likewise. * config/loongarch/loongarch.cc (loongarch_pre_reload_split): Implement. (loongarch_use_bstrins_for_ior_with_mask): Likewise. * config/loongarch/predicates.md (ins_zero_bitmask_operand): New predicate. * config/loongarch/loongarch.md (bstrins__for_mask): New define_insn_and_split. (bstrins__for_ior_mask): Likewise. (define_peephole2): Further optimize code sequence produced by bstrins__for_ior_mask if possible. gcc/testsuite/ChangeLog: * g++.target/loongarch/bstrins-compile.C: New test. * g++.target/loongarch/bstrins-run.C: New test. --- gcc/config/loongarch/loongarch-protos.h | 4 +- gcc/config/loongarch/loongarch.cc | 36 ++++++++ gcc/config/loongarch/loongarch.md | 91 +++++++++++++++++++ gcc/config/loongarch/predicates.md | 8 ++ .../g++.target/loongarch/bstrins-compile.C | 22 +++++ .../g++.target/loongarch/bstrins-run.C | 65 +++++++++++++ 6 files changed, 225 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.target/loongarch/bstrins-compile.C create mode 100644 gcc/testsuite/g++.target/loongarch/bstrins-run.C diff --git a/gcc/config/loongarch/loongarch-protos.h b/gcc/config/loongarch/loongarch-protos.h index f4430d0d418..251011c5414 100644 --- a/gcc/config/loongarch/loongarch-protos.h +++ b/gcc/config/loongarch/loongarch-protos.h @@ -56,7 +56,7 @@ enum loongarch_symbol_type { }; #define NUM_SYMBOL_TYPES (SYMBOL_TLSLDM + 1) -/* Routines implemented in loongarch.c. */ +/* Routines implemented in loongarch.cc. */ extern rtx loongarch_emit_move (rtx, rtx); extern HOST_WIDE_INT loongarch_initial_elimination_offset (int, int); extern void loongarch_expand_prologue (void); @@ -163,6 +163,8 @@ extern const char *current_section_name (void); extern unsigned int current_section_flags (void); extern bool loongarch_use_ins_ext_p (rtx, HOST_WIDE_INT, HOST_WIDE_INT); extern bool loongarch_check_zero_div_p (void); +extern bool loongarch_pre_reload_split (void); +extern int loongarch_use_bstrins_for_ior_with_mask (machine_mode, rtx *); union loongarch_gen_fn_ptrs { diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index aeb37f0f2f7..6698414281e 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -5482,6 +5482,42 @@ loongarch_use_ins_ext_p (rtx op, HOST_WIDE_INT width, HOST_WIDE_INT bitpos) return true; } +/* Predicate for pre-reload splitters with associated instructions, + which can match any time before the split1 pass (usually combine), + then are unconditionally split in that pass and should not be + matched again afterwards. */ + +bool loongarch_pre_reload_split (void) +{ + return (can_create_pseudo_p () + && !(cfun->curr_properties & PROP_rtl_split_insns)); +} + +/* Check if we can use bstrins. for + op0 = (op1 & op2) | (op3 & op4) + where op0, op1, op3 are regs, and op2, op4 are integer constants. */ +int +loongarch_use_bstrins_for_ior_with_mask (machine_mode mode, rtx *op) +{ + unsigned HOST_WIDE_INT mask1 = UINTVAL (op[2]); + unsigned HOST_WIDE_INT mask2 = UINTVAL (op[4]); + + if (mask1 != ~mask2 || !mask1 || !mask2) + return 0; + + /* Try to avoid a right-shift. */ + if (low_bitmask_len (mode, mask1) != -1) + return -1; + + if (low_bitmask_len (mode, mask2 >> (ffs_hwi (mask2) - 1)) != -1) + return 1; + + if (low_bitmask_len (mode, mask1 >> (ffs_hwi (mask1) - 1)) != -1) + return -1; + + return 0; +} + /* Print the text for PRINT_OPERAND punctation character CH to FILE. The punctuation characters are: diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index 2308db16902..75f641b38ee 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -1322,6 +1322,97 @@ (define_insn "and3_extended" [(set_attr "move_type" "pick_ins") (set_attr "mode" "")]) +(define_insn_and_split "*bstrins__for_mask" + [(set (match_operand:GPR 0 "register_operand") + (and:GPR (match_operand:GPR 1 "register_operand") + (match_operand:GPR 2 "ins_zero_bitmask_operand")))] + "" + "#" + "" + [(set (match_dup 0) (match_dup 1)) + (set (zero_extract:GPR (match_dup 0) (match_dup 2) (match_dup 3)) + (const_int 0))] + { + unsigned HOST_WIDE_INT mask = ~UINTVAL (operands[2]); + int lo = ffs_hwi (mask) - 1; + int len = low_bitmask_len (mode, mask >> lo); + + len = MIN (len, GET_MODE_BITSIZE (mode) - lo); + operands[2] = GEN_INT (len); + operands[3] = GEN_INT (lo); + }) + +(define_insn_and_split "*bstrins__for_ior_mask" + [(set (match_operand:GPR 0 "register_operand") + (ior:GPR (and:GPR (match_operand:GPR 1 "register_operand") + (match_operand:GPR 2 "const_int_operand")) + (and:GPR (match_operand:GPR 3 "register_operand") + (match_operand:GPR 4 "const_int_operand"))))] + "loongarch_pre_reload_split () && \ + loongarch_use_bstrins_for_ior_with_mask (mode, operands)" + "#" + "" + [(set (match_dup 0) (match_dup 1)) + (set (zero_extract:GPR (match_dup 0) (match_dup 2) (match_dup 4)) + (match_dup 3))] + { + if (loongarch_use_bstrins_for_ior_with_mask (mode, operands) < 0) + { + std::swap (operands[1], operands[3]); + std::swap (operands[2], operands[4]); + } + + unsigned HOST_WIDE_INT mask = ~UINTVAL (operands[2]); + int lo = ffs_hwi (mask) - 1; + int len = low_bitmask_len (mode, mask >> lo); + + len = MIN (len, GET_MODE_BITSIZE (mode) - lo); + operands[2] = GEN_INT (len); + operands[4] = GEN_INT (lo); + + if (lo) + { + rtx tmp = gen_reg_rtx (mode); + emit_move_insn (tmp, gen_rtx_ASHIFTRT(mode, operands[3], + GEN_INT (lo))); + operands[3] = tmp; + } + }) + +;; We always avoid the shift operation in bstrins__for_ior_mask +;; if possible, but the result may be sub-optimal when one of the masks +;; is (1 << N) - 1 and one of the src register is the dest register. +;; For example: +;; move t0, a0 +;; move a0, a1 +;; bstrins.d a0, t0, 42, 0 +;; ret +;; using a shift operation would be better: +;; srai.d t0, a1, 43 +;; bstrins.d a0, t0, 63, 43 +;; ret +;; unfortunately we cannot figure it out in split1: before reload we cannot +;; know if the dest register is one of the src register. Fix it up in +;; peephole2. +(define_peephole2 + [(set (match_operand:GPR 0 "register_operand") + (match_operand:GPR 1 "register_operand")) + (set (match_dup 1) (match_operand:GPR 2 "register_operand")) + (set (zero_extract:GPR (match_dup 1) + (match_operand:SI 3 "const_int_operand") + (const_int 0)) + (match_dup 0))] + "peep2_reg_dead_p (3, operands[0])" + [(const_int 0)] + { + int len = GET_MODE_BITSIZE (mode) - INTVAL (operands[3]); + + emit_insn (gen_ashr3 (operands[0], operands[2], operands[3])); + emit_insn (gen_insv (operands[1], GEN_INT (len), operands[3], + operands[0])); + DONE; + }) + (define_insn "*iorhi3" [(set (match_operand:HI 0 "register_operand" "=r,r") (ior:HI (match_operand:HI 1 "register_operand" "%r,r") diff --git a/gcc/config/loongarch/predicates.md b/gcc/config/loongarch/predicates.md index f430629825e..499518b82ba 100644 --- a/gcc/config/loongarch/predicates.md +++ b/gcc/config/loongarch/predicates.md @@ -408,6 +408,14 @@ (define_predicate "fcc_reload_operand" (define_predicate "muldiv_target_operand" (match_operand 0 "register_operand")) +(define_predicate "ins_zero_bitmask_operand" + (and (match_code "const_int") + (match_test "INTVAL (op) != -1") + (match_test "INTVAL (op) & 1") + (match_test "low_bitmask_len (mode, \ + ~UINTVAL (op) | (~UINTVAL(op) - 1)) \ + > 12"))) + (define_predicate "const_call_insn_operand" (match_code "const,symbol_ref,label_ref") { diff --git a/gcc/testsuite/g++.target/loongarch/bstrins-compile.C b/gcc/testsuite/g++.target/loongarch/bstrins-compile.C new file mode 100644 index 00000000000..3c0db1de4c6 --- /dev/null +++ b/gcc/testsuite/g++.target/loongarch/bstrins-compile.C @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options "-std=c++14 -O2 -march=loongarch64 -mabi=lp64d" } */ +/* { dg-final { scan-assembler "bstrins\\.d.*7,4" } } */ +/* { dg-final { scan-assembler "bstrins\\.d.*15,4" } } */ +/* { dg-final { scan-assembler "bstrins\\.d.*31,4" } } */ +/* { dg-final { scan-assembler "bstrins\\.d.*47,4" } } */ +/* { dg-final { scan-assembler "bstrins\\.d.*3,0" } } */ + +typedef unsigned long u64; + +template +u64 +test (u64 a, u64 b) +{ + return (a & mask) | (b & ~mask); +} + +template u64 test<0x0000'0000'0000'00f0l> (u64, u64); +template u64 test<0x0000'0000'0000'fff0l> (u64, u64); +template u64 test<0x0000'0000'ffff'fff0l> (u64, u64); +template u64 test<0x0000'ffff'ffff'fff0l> (u64, u64); +template u64 test<0xffff'ffff'ffff'fff0l> (u64, u64); diff --git a/gcc/testsuite/g++.target/loongarch/bstrins-run.C b/gcc/testsuite/g++.target/loongarch/bstrins-run.C new file mode 100644 index 00000000000..68913d5e0fc --- /dev/null +++ b/gcc/testsuite/g++.target/loongarch/bstrins-run.C @@ -0,0 +1,65 @@ +/* { dg-do run } */ +/* { dg-options "-O2" } */ + +typedef unsigned long gr; + +template +struct mask { + enum { value = (1ul << r) - (1ul << l) }; +}; + +template +struct mask { + enum { value = -(1ul << l) }; +}; + +__attribute__ ((noipa)) void +test (gr a, gr b, gr mask, gr out) +{ + if (((a & mask) | (b & ~mask)) != out) + __builtin_abort (); +} + +__attribute__ ((noipa)) gr +no_optimize (gr x) +{ + return x; +} + +template +struct test1 { + static void + run (void) + { + gr m = mask::value; + gr a = no_optimize (-1ul); + gr b = no_optimize (0); + + test (a, b, m, (a & m) | (b & ~m)); + test (a, b, ~m, (a & ~m) | (b & m)); + test (a, 0, ~m, a & ~m); + + test1::run (); + } +}; + +template +struct test1 { + static void run (void) {} +}; + +template +void +test2 (void) +{ + test1::run (); + test2 (); +} + +template <> void test2 (void) {} + +int +main () +{ + test2<0> (); +}