From patchwork Sun Oct 8 19:07:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roger Sayle X-Patchwork-Id: 149770 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a888:0:b0:403:3b70:6f57 with SMTP id x8csp1494968vqo; Sun, 8 Oct 2023 12:07:55 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGnJYgZYqOY8pUdFOzRZWAlZ7QKy0lj+ZfKNlU9Om5gyFvZmRh6NykSPIIPtgPINkqSn+2w X-Received: by 2002:aa7:dac3:0:b0:52b:d169:b382 with SMTP id x3-20020aa7dac3000000b0052bd169b382mr11412301eds.29.1696792075076; Sun, 08 Oct 2023 12:07:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696792075; cv=none; d=google.com; s=arc-20160816; b=wpMJeeezcJpc9mt797+Keiisr3a7wLPR91xcBBBUxXLKXCJmIAwc6jKkRv8B4X0Vtc 3wlC+Yslyz5CReOTYqsWnUd3eQKFq8CZw0/zlrwvrJL9MO2VXphdr7VTN6QnxNoCmQbI grFZi/mOMQcXEFMWFeMhp8dhvpRa91mzjXZNbMsE7iRK1wYavBzpmCTCAXFcislaeiyN 3P05ZW/jguoI0J6aidiCivV7y07vKV1RY2Ox97vwgTscVOQ1x3YChbq7Lhrjqf0Ay6Fi o3j8Q1qzBqxEUDEs0zjqe7Q9r3NoagcQSFi4lwfTMnvNJQlq7u+DhMeiCI1rPtRojUxY uHZg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-language:thread-index :mime-version:message-id:date:subject:cc:to:from:dkim-signature :dmarc-filter:delivered-to; bh=4CKMa4gFSSHu81uFAgQnukMcEeO6szSuN6Lu8MSJNMg=; fh=Ht8bq8SVyF6sx4+E7Os+tBO2MuNVfxRGp/jyiuwYCzE=; b=aNzz63GIFIxdbS3YdtSCIYAiwASIEfmXf7JVYGpvvmyGJbaD7JbV4UAiF8TBdeTmzc eVjdw8jd+l9FGa0fhtvbPpnkg/CqyvyjcvoNOP1L82PKqQ4PxjcA7Ilr/KekcKX3Anki beDX9lVhaVjCgwSMAGlJrTeZRQDIV5L45HaY6wLg5iuWM9kGZDIH9jeAUqrACxELQNRF i7XR5GumQVv59vThElpwGv5eLVKTdNZ9nZJPyZc0lmk4caxM5lOtp0U2Lc+oEHGvrSwC yLzssndMpfWTgM7Plr8xHf7OtHD8/x03VSaowkv6J+pRjoYWPjIkgAYG/nYk9OeOVdKW Tctg== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@nextmovesoftware.com header.s=default header.b=UAG29ZeC; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id g11-20020aa7c58b000000b0053463ddea37si3613604edq.183.2023.10.08.12.07.54 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 08 Oct 2023 12:07:55 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=fail header.i=@nextmovesoftware.com header.s=default header.b=UAG29ZeC; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 15BA8385840A for ; Sun, 8 Oct 2023 19:07:41 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id E5C653858D20 for ; Sun, 8 Oct 2023 19:07:14 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E5C653858D20 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=4CKMa4gFSSHu81uFAgQnukMcEeO6szSuN6Lu8MSJNMg=; b=UAG29ZeCk8HZPVgn8aZLeaZewt sKr0HHni+TLV2YOdk7EeT8KiWTmETK7UX45JvaAg0koua4Ji8z+j6vjOCGse9z6UrzQ3zcKdyW9Eh pmqJJSSmjs8mUWh41lNwHQxAtyyoqiv96SL3LL5wacOuNRgNo5GyVkgiDbK/zjO4HQ9SmLjh0nMUb JeHYxjPYL0zetT4VtsnfVpjsLjwHPsbSHcM7P6VjvkBAjyXd0/mspyb1xONchREtnJywoXXyg7v5n YIuNbg/ZI6RWrPSVVLkPeMjTaDNPtwgIZgzIonPJ/j84bYR1RTHxQwQyKUkeXeWj+MEduIopFV76f OhvAO0Kw==; Received: from host86-160-20-38.range86-160.btcentralplus.com ([86.160.20.38]:54615 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96.1) (envelope-from ) id 1qpZ7R-0004w5-2y; Sun, 08 Oct 2023 15:07:14 -0400 From: "Roger Sayle" To: Cc: "'Claudiu Zissulescu'" Subject: [ARC PATCH] Improved SImode shifts and rotates on !TARGET_BARREL_SHIFTER. Date: Sun, 8 Oct 2023 20:07:11 +0100 Message-ID: <002701d9fa1a$a55dec70$f019c550$@nextmovesoftware.com> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 16.0 Thread-Index: Adn6GS+UN92Gy8yOQ/O55H2RIK8m9w== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1779215446610004197 X-GMAIL-MSGID: 1779215446610004197 This patch completes the ARC back-end's transition to using pre-reload splitters for SImode shifts and rotates on targets without a barrel shifter. The core part is that the shift_si3 define_insn is no longer needed, as shifts and rotates that don't require a loop are split before reload, and then because shift_si3_loop is the only caller of output_shift, both can be significantly cleaned up and simplified. The output_shift function (Claudiu's "the elephant in the room") is renamed output_shift_loop, which handles just the four instruction zero-overhead loop implementations. Aside from the clean-ups, the user visible changes are much improved implementations of SImode shifts and rotates on affected targets. For the function: unsigned int rotr_1 (unsigned int x) { return (x >> 1) | (x << 31); } GCC with -O2 -mcpu=em would previously generate: rotr_1: lsr_s r2,r0 bmsk_s r0,r0,0 ror r0,r0 j_s.d [blink] or_s r0,r0,r2 with this patch, we now generate: j_s.d [blink] ror r0,r0 For the function: unsigned int rotr_31 (unsigned int x) { return (x >> 31) | (x << 1); } GCC with -O2 -mcpu=em would previously generate: rotr_31: mov_s r2,r0 ;4 asl_s r0,r0 add.f 0,r2,r2 rlc r2,0 j_s.d [blink] or_s r0,r0,r2 with this patch we now generate an add.f followed by an adc: rotr_31: add.f r0,r0,r0 j_s.d [blink] add.cs r0,r0,1 Shifts by constants requiring a loop have been improved for even counts by performing two operations in each iteration: int shl10(int x) { return x >> 10; } Previously looked like: shl10: mov.f lp_count, 10 lpnz 2f asr r0,r0 nop 2: # end single insn loop j_s [blink] And now becomes: shl10: mov lp_count,5 lp 2f asr r0,r0 asr r0,r0 2: # end single insn loop j_s [blink] So emulating ARC's SWAP on architectures that don't have it: unsigned int rotr_16 (unsigned int x) { return (x >> 16) | (x << 16); } previously required 10 instructions and ~70 cycles: rotr_16: mov_s r2,r0 ;4 mov.f lp_count, 16 lpnz 2f add r0,r0,r0 nop 2: # end single insn loop mov.f lp_count, 16 lpnz 2f lsr r2,r2 nop 2: # end single insn loop j_s.d [blink] or_s r0,r0,r2 now becomes just 4 instructions and ~18 cycles: rotr_16: mov lp_count,8 lp 2f ror r0,r0 ror r0,r0 2: # end single insn loop j_s [blink] This patch has been tested with a cross-compiler to arc-linux hosted on x86_64-pc-linux-gnu and (partially) tested with the compile-only portions of the testsuite with no regressions. Ok for mainline, if your own testing shows no issues? 2023-10-07 Roger Sayle gcc/ChangeLog * config/arc/arc-protos.h (output_shift): Rename to... (output_shift_loop): Tweak API to take an explicit rtx_code. (arc_split_ashl): Prototype new function here. (arc_split_ashr): Likewise. (arc_split_lshr): Likewise. (arc_split_rotl): Likewise. (arc_split_rotr): Likewise. * config/arc/arc.cc (output_shift): Delete local prototype. Rename. (output_shift_loop): New function replacing output_shift to output a zero overheap loop for SImode shifts and rotates on ARC targets without barrel shifter (i.e. no hardware support for these insns). (arc_split_ashl): New helper function to split *ashlsi3_nobs. (arc_split_ashr): New helper function to split *ashrsi3_nobs. (arc_split_lshr): New helper function to split *lshrsi3_nobs. (arc_split_rotl): New helper function to split *rotlsi3_nobs. (arc_split_rotr): New helper function to split *rotrsi3_nobs. * config/arc/arc.md (any_shift_rotate): New define_code_iterator. (define_code_attr insn): New code attribute to map to pattern name. (si3): New expander unifying previous ashlsi3, ashrsi3 and lshrsi3 define_expands. Adds rotlsi3 and rotrsi3. (*si3_nobs): New define_insn_and_split that unifies the previous *ashlsi3_nobs, *ashrsi3_nobs and *lshrsi3_nobs. We now call arc_split_ in arc.cc to implement each split. (shift_si3): Delete define_insn, all shifts/rotates are now split. (shift_si3_loop): Rename to... (si3_loop): define_insn to handle loop implementations of SImode shifts and rotates, calling ouput_shift_loop for template. (rotrsi3): Rename to... (*rotrsi3_insn): define_insn for TARGET_BARREL_SHIFTER's ror. (*rotlsi3): New define_insn_and_split to transform left rotates into right rotates before reload. (rotlsi3_cnt1): New define_insn_and_split to implement a left rotate by one bit using an add.f followed by an adc. * config/arc/predicates.md (shiftr4_operator): Delete. Thanks in advance, Roger diff --git a/gcc/config/arc/arc-protos.h b/gcc/config/arc/arc-protos.h index 026ea99..a48d850 100644 --- a/gcc/config/arc/arc-protos.h +++ b/gcc/config/arc/arc-protos.h @@ -25,7 +25,12 @@ extern machine_mode arc_select_cc_mode (enum rtx_code, rtx, rtx); extern struct rtx_def *gen_compare_reg (rtx, machine_mode); /* Declarations for various fns used in the .md file. */ -extern const char *output_shift (rtx *); +extern const char *output_shift_loop (enum rtx_code, rtx *); +extern void arc_split_ashl (rtx *); +extern void arc_split_ashr (rtx *); +extern void arc_split_lshr (rtx *); +extern void arc_split_rotl (rtx *); +extern void arc_split_rotr (rtx *); extern bool compact_sda_memory_operand (rtx, machine_mode, bool); extern bool arc_double_limm_p (rtx); extern void arc_print_operand (FILE *, rtx, int); diff --git a/gcc/config/arc/arc.cc b/gcc/config/arc/arc.cc index ecc681c..92dcf21 100644 --- a/gcc/config/arc/arc.cc +++ b/gcc/config/arc/arc.cc @@ -241,7 +241,6 @@ static int branch_dest (rtx); static void arc_output_pic_addr_const (FILE *, rtx, int); static bool arc_function_ok_for_sibcall (tree, tree); static rtx arc_function_value (const_tree, const_tree, bool); -const char * output_shift (rtx *); static void arc_reorg (void); static bool arc_in_small_data_p (const_tree); @@ -4151,143 +4150,287 @@ arc_pre_reload_split (void) && !(cfun->curr_properties & PROP_rtl_split_insns)); } -/* Output the assembler code for doing a shift. - We go to a bit of trouble to generate efficient code as the ARC601 only has - single bit shifts. This is taken from the h8300 port. We only have one - mode of shifting and can't access individual bytes like the h8300 can, so - this is greatly simplified (at the expense of not generating hyper- - efficient code). - - This function is not used if the variable shift insns are present. */ - -/* FIXME: This probably can be done using a define_split in arc.md. - Alternately, generate rtx rather than output instructions. */ +/* Output the assembler code for a zero-overhead loop doing a shift + or rotate. We know OPERANDS[0] == OPERANDS[1], and the bit count + is OPERANDS[2]. */ const char * -output_shift (rtx *operands) +output_shift_loop (enum rtx_code code, rtx *operands) { - /* static int loopend_lab;*/ - rtx shift = operands[3]; - machine_mode mode = GET_MODE (shift); - enum rtx_code code = GET_CODE (shift); - const char *shift_one; - - gcc_assert (mode == SImode); - - switch (code) - { - case ASHIFT: shift_one = "add %0,%1,%1"; break; - case ASHIFTRT: shift_one = "asr %0,%1"; break; - case LSHIFTRT: shift_one = "lsr %0,%1"; break; - default: gcc_unreachable (); - } + bool twice_p = false; + gcc_assert (GET_MODE (operands[0]) == SImode); if (GET_CODE (operands[2]) != CONST_INT) { - output_asm_insn ("and.f lp_count,%2, 0x1f", operands); - goto shiftloop; + output_asm_insn ("and.f\tlp_count,%2,0x1f", operands); + output_asm_insn ("lpnz\t2f", operands); } else { - int n; + int n = INTVAL (operands[2]) & 31; + if (!n) + { + output_asm_insn ("mov\t%0,%1",operands); + return ""; + } + + if ((n & 1) == 0 && code != ROTATE) + { + twice_p = true; + n >>= 1; + } + operands[2] = GEN_INT (n); + output_asm_insn ("mov\tlp_count,%2", operands); + output_asm_insn ("lp\t2f", operands); + } + + switch (code) + { + case ASHIFT: + output_asm_insn ("add\t%0,%1,%1", operands); + if (twice_p) + output_asm_insn ("add\t%0,%1,%1", operands); + break; + case ASHIFTRT: + output_asm_insn ("asr\t%0,%1", operands); + if (twice_p) + output_asm_insn ("asr\t%0,%1", operands); + break; + case LSHIFTRT: + output_asm_insn ("lsr\t%0,%1", operands); + if (twice_p) + output_asm_insn ("lsr\t%0,%1", operands); + break; + case ROTATERT: + output_asm_insn ("ror\t%0,%1", operands); + if (twice_p) + output_asm_insn ("ror\t%0,%1", operands); + break; + case ROTATE: + output_asm_insn ("add.f\t%0,%1,%1", operands); + output_asm_insn ("adc\t%0,%0,0", operands); + twice_p = true; + break; + default: + gcc_unreachable (); + } - n = INTVAL (operands[2]); + if (!twice_p) + output_asm_insn ("nop", operands); + fprintf (asm_out_file, "2:\t%s end single insn loop\n", ASM_COMMENT_START); + return ""; +} - /* Only consider the lower 5 bits of the shift count. */ - n = n & 0x1f; - /* First see if we can do them inline. */ - /* ??? We could get better scheduling & shorter code (using short insns) - by using splitters. Alas, that'd be even more verbose. */ - if (code == ASHIFT && n <= 9 && n > 2 - && dest_reg_operand (operands[4], SImode)) +/* Split SImode left shift instruction. */ +void +arc_split_ashl (rtx *operands) +{ + if (CONST_INT_P (operands[2])) + { + int n = INTVAL (operands[2]) & 0x1f; + if (n <= 9) { - output_asm_insn ("mov %4,0\n\tadd3 %0,%4,%1", operands); - for (n -=3 ; n >= 3; n -= 3) - output_asm_insn ("add3 %0,%4,%0", operands); - if (n == 2) - output_asm_insn ("add2 %0,%4,%0", operands); - else if (n) - output_asm_insn ("add %0,%0,%0", operands); + if (n == 0) + emit_move_insn (operands[0], operands[1]); + else if (n <= 2) + { + emit_insn (gen_ashlsi3_cnt1 (operands[0], operands[1])); + if (n == 2) + emit_insn (gen_ashlsi3_cnt1 (operands[0], operands[0])); + } + else + { + rtx zero = gen_reg_rtx (SImode); + emit_move_insn (zero, const0_rtx); + emit_insn (gen_add_shift (operands[0], operands[1], + GEN_INT (3), zero)); + for (n -= 3; n >= 3; n -= 3) + emit_insn (gen_add_shift (operands[0], operands[0], + GEN_INT (3), zero)); + if (n == 2) + emit_insn (gen_add_shift (operands[0], operands[0], + const2_rtx, zero)); + else if (n) + emit_insn (gen_ashlsi3_cnt1 (operands[0], operands[0])); + } + return; } - else if (n <= 4) + else if (n >= 29) { - while (--n >= 0) + if (n < 31) { - output_asm_insn (shift_one, operands); - operands[1] = operands[0]; + if (n == 29) + { + emit_insn (gen_andsi3_i (operands[0], operands[1], + GEN_INT (7))); + emit_insn (gen_rotrsi3_cnt1 (operands[0], operands[0])); + } + else + emit_insn (gen_andsi3_i (operands[0], operands[1], + GEN_INT (3))); + emit_insn (gen_rotrsi3_cnt1 (operands[0], operands[0])); } + else + emit_insn (gen_andsi3_i (operands[0], operands[1], const1_rtx)); + emit_insn (gen_rotrsi3_cnt1 (operands[0], operands[0])); + return; } - /* See if we can use a rotate/and. */ - else if (n == BITS_PER_WORD - 1) + } + + emit_insn (gen_ashlsi3_loop (operands[0], operands[1], operands[2])); +} + +/* Split SImode arithmetic right shift instruction. */ +void +arc_split_ashr (rtx *operands) +{ + if (CONST_INT_P (operands[2])) + { + int n = INTVAL (operands[2]) & 0x1f; + if (n <= 4) { - switch (code) + if (n != 0) { - case ASHIFT : - output_asm_insn ("and %0,%1,1\n\tror %0,%0", operands); - break; - case ASHIFTRT : - /* The ARC doesn't have a rol insn. Use something else. */ - output_asm_insn ("add.f 0,%1,%1\n\tsbc %0,%0,%0", operands); - break; - case LSHIFTRT : - /* The ARC doesn't have a rol insn. Use something else. */ - output_asm_insn ("add.f 0,%1,%1\n\trlc %0,0", operands); - break; - default: - break; + emit_insn (gen_ashrsi3_cnt1 (operands[0], operands[1])); + while (--n > 0) + emit_insn (gen_ashrsi3_cnt1 (operands[0], operands[0])); + } + else + emit_move_insn (operands[0], operands[1]); + return; + } + else if (n == 30) + { + rtx tmp = gen_reg_rtx (SImode); + emit_insn (gen_add_f (tmp, operands[1], operands[1])); + emit_insn (gen_sbc (operands[0], operands[0], operands[0])); + emit_insn (gen_addsi_compare_2 (tmp, tmp)); + emit_insn (gen_adc (operands[0], operands[0], operands[0])); + return; + } + else if (n == 31) + { + emit_insn (gen_addsi_compare_2 (operands[1], operands[1])); + emit_insn (gen_sbc (operands[0], operands[0], operands[0])); + return; + } + } + + emit_insn (gen_ashrsi3_loop (operands[0], operands[1], operands[2])); +} + +/* Split SImode logical right shift instruction. */ +void +arc_split_lshr (rtx *operands) +{ + if (CONST_INT_P (operands[2])) + { + int n = INTVAL (operands[2]) & 0x1f; + if (n <= 4) + { + if (n != 0) + { + emit_insn (gen_lshrsi3_cnt1 (operands[0], operands[1])); + while (--n > 0) + emit_insn (gen_lshrsi3_cnt1 (operands[0], operands[0])); } + else + emit_move_insn (operands[0], operands[1]); + return; } - else if (n == BITS_PER_WORD - 2 && dest_reg_operand (operands[4], SImode)) + else if (n == 30) { - switch (code) + rtx tmp = gen_reg_rtx (SImode); + emit_insn (gen_add_f (tmp, operands[1], operands[1])); + emit_insn (gen_scc_ltu_cc_c (operands[0])); + emit_insn (gen_addsi_compare_2 (tmp, tmp)); + emit_insn (gen_adc (operands[0], operands[0], operands[0])); + return; + } + else if (n == 31) + { + emit_insn (gen_addsi_compare_2 (operands[1], operands[1])); + emit_insn (gen_scc_ltu_cc_c (operands[0])); + return; + } + } + + emit_insn (gen_lshrsi3_loop (operands[0], operands[1], operands[2])); +} + +/* Split SImode rotate left instruction. */ +void +arc_split_rotl (rtx *operands) +{ + if (CONST_INT_P (operands[2])) + { + int n = INTVAL (operands[2]) & 0x1f; + if (n <= 2) + { + if (n != 0) { - case ASHIFT : - output_asm_insn ("and %0,%1,3\n\tror %0,%0\n\tror %0,%0", operands); - break; - case ASHIFTRT : -#if 1 /* Need some scheduling comparisons. */ - output_asm_insn ("add.f %4,%1,%1\n\tsbc %0,%0,%0\n\t" - "add.f 0,%4,%4\n\trlc %0,%0", operands); -#else - output_asm_insn ("add.f %4,%1,%1\n\tbxor %0,%4,31\n\t" - "sbc.f %0,%0,%4\n\trlc %0,%0", operands); -#endif - break; - case LSHIFTRT : -#if 1 - output_asm_insn ("add.f %4,%1,%1\n\trlc %0,0\n\t" - "add.f 0,%4,%4\n\trlc %0,%0", operands); -#else - output_asm_insn ("add.f %0,%1,%1\n\trlc.f %0,0\n\t" - "and %0,%0,1\n\trlc %0,%0", operands); -#endif - break; - default: - break; + emit_insn (gen_rotlsi3_cnt1 (operands[0], operands[1])); + if (n == 2) + emit_insn (gen_rotlsi3_cnt1 (operands[0], operands[0])); } + else + emit_move_insn (operands[0], operands[1]); + return; } - else if (n == BITS_PER_WORD - 3 && code == ASHIFT) - output_asm_insn ("and %0,%1,7\n\tror %0,%0\n\tror %0,%0\n\tror %0,%0", - operands); - /* Must loop. */ - else + else if (n >= 28) { - operands[2] = GEN_INT (n); - output_asm_insn ("mov.f lp_count, %2", operands); + emit_insn (gen_rotrsi3_cnt1 (operands[0], operands[1])); + while (++n < 32) + emit_insn (gen_rotrsi3_cnt1 (operands[0], operands[0])); + return; + } + else if (n >= 16 || n == 12 || n == 14) + { + emit_insn (gen_rotrsi3_loop (operands[0], operands[1], + GEN_INT (32 - n))); + return; + } + } + + emit_insn (gen_rotlsi3_loop (operands[0], operands[1], operands[2])); +} - shiftloop: +/* Split SImode rotate right instruction. */ +void +arc_split_rotr (rtx *operands) +{ + if (CONST_INT_P (operands[2])) + { + int n = INTVAL (operands[2]) & 0x1f; + if (n <= 4) + { + if (n != 0) { - output_asm_insn ("lpnz\t2f", operands); - output_asm_insn (shift_one, operands); - output_asm_insn ("nop", operands); - fprintf (asm_out_file, "2:\t%s end single insn loop\n", - ASM_COMMENT_START); + emit_insn (gen_rotrsi3_cnt1 (operands[0], operands[1])); + while (--n > 0) + emit_insn (gen_rotrsi3_cnt1 (operands[0], operands[0])); } + else + emit_move_insn (operands[0], operands[1]); + return; + } + else if (n >= 30) + { + emit_insn (gen_rotlsi3_cnt1 (operands[0], operands[1])); + if (n == 31) + emit_insn (gen_rotlsi3_cnt1 (operands[1], operands[1])); + return; + } + else if (n >= 21 || n == 17 || n == 19) + { + emit_insn (gen_rotrsi3_loop (operands[0], operands[1], + GEN_INT (32 - n))); + return; } } - return ""; + emit_insn (gen_rotrsi3_loop (operands[0], operands[1], operands[2])); } /* Nested function support. */ diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md index cedb951..61f60f2 100644 --- a/gcc/config/arc/arc.md +++ b/gcc/config/arc/arc.md @@ -3354,22 +3354,16 @@ archs4x, archs4xd" ;; Shift instructions. -(define_expand "ashlsi3" - [(set (match_operand:SI 0 "dest_reg_operand" "") - (ashift:SI (match_operand:SI 1 "register_operand" "") - (match_operand:SI 2 "nonmemory_operand" "")))] - "") +(define_code_iterator any_shift_rotate [ashift ashiftrt lshiftrt + rotate rotatert]) -(define_expand "ashrsi3" - [(set (match_operand:SI 0 "dest_reg_operand" "") - (ashiftrt:SI (match_operand:SI 1 "register_operand" "") - (match_operand:SI 2 "nonmemory_operand" "")))] - "") +(define_code_attr insn [(ashift "ashl") (ashiftrt "ashr") (lshiftrt "lshr") + (rotate "rotl") (rotatert "rotr")]) -(define_expand "lshrsi3" +(define_expand "si3" [(set (match_operand:SI 0 "dest_reg_operand" "") - (lshiftrt:SI (match_operand:SI 1 "register_operand" "") - (match_operand:SI 2 "nonmemory_operand" "")))] + (any_shift_rotate:SI (match_operand:SI 1 "register_operand" "") + (match_operand:SI 2 "nonmemory_operand" "")))] "") ; asl, asr, lsr patterns: @@ -3422,117 +3416,10 @@ archs4x, archs4xd" (set_attr "predicable" "no,no,yes,no,no") (set_attr "cond" "nocond,canuse,canuse,nocond,nocond")]) -(define_insn_and_split "*ashlsi3_nobs" - [(set (match_operand:SI 0 "dest_reg_operand") - (ashift:SI (match_operand:SI 1 "register_operand") - (match_operand:SI 2 "nonmemory_operand")))] - "!TARGET_BARREL_SHIFTER - && operands[2] != const1_rtx - && arc_pre_reload_split ()" - "#" - "&& 1" - [(const_int 0)] -{ - if (CONST_INT_P (operands[2])) - { - int n = INTVAL (operands[2]) & 0x1f; - if (n <= 9) - { - if (n == 0) - emit_move_insn (operands[0], operands[1]); - else if (n <= 2) - { - emit_insn (gen_ashlsi3_cnt1 (operands[0], operands[1])); - if (n == 2) - emit_insn (gen_ashlsi3_cnt1 (operands[0], operands[0])); - } - else - { - rtx zero = gen_reg_rtx (SImode); - emit_move_insn (zero, const0_rtx); - emit_insn (gen_add_shift (operands[0], operands[1], - GEN_INT (3), zero)); - for (n -= 3; n >= 3; n -= 3) - emit_insn (gen_add_shift (operands[0], operands[0], - GEN_INT (3), zero)); - if (n == 2) - emit_insn (gen_add_shift (operands[0], operands[0], - const2_rtx, zero)); - else if (n) - emit_insn (gen_ashlsi3_cnt1 (operands[0], operands[0])); - } - DONE; - } - else if (n >= 29) - { - if (n < 31) - { - if (n == 29) - { - emit_insn (gen_andsi3_i (operands[0], operands[1], - GEN_INT (7))); - emit_insn (gen_rotrsi3_cnt1 (operands[0], operands[0])); - } - else - emit_insn (gen_andsi3_i (operands[0], operands[1], - GEN_INT (3))); - emit_insn (gen_rotrsi3_cnt1 (operands[0], operands[0])); - } - else - emit_insn (gen_andsi3_i (operands[0], operands[1], const1_rtx)); - emit_insn (gen_rotrsi3_cnt1 (operands[0], operands[0])); - DONE; - } - } - - rtx shift = gen_rtx_fmt_ee (ASHIFT, SImode, operands[1], operands[2]); - emit_insn (gen_shift_si3_loop (operands[0], operands[1], - operands[2], shift)); - DONE; -}) - -(define_insn_and_split "*ashlri3_nobs" - [(set (match_operand:SI 0 "dest_reg_operand") - (ashiftrt:SI (match_operand:SI 1 "register_operand") - (match_operand:SI 2 "nonmemory_operand")))] - "!TARGET_BARREL_SHIFTER - && operands[2] != const1_rtx - && arc_pre_reload_split ()" - "#" - "&& 1" - [(const_int 0)] -{ - if (CONST_INT_P (operands[2])) - { - int n = INTVAL (operands[2]) & 0x1f; - if (n <= 4) - { - if (n != 0) - { - emit_insn (gen_ashrsi3_cnt1 (operands[0], operands[1])); - while (--n > 0) - emit_insn (gen_ashrsi3_cnt1 (operands[0], operands[0])); - } - else - emit_move_insn (operands[0], operands[1]); - DONE; - } - } - - rtx pat; - rtx shift = gen_rtx_fmt_ee (ASHIFTRT, SImode, operands[1], operands[2]); - if (shiftr4_operator (shift, SImode)) - pat = gen_shift_si3 (operands[0], operands[1], operands[2], shift); - else - pat = gen_shift_si3_loop (operands[0], operands[1], operands[2], shift); - emit_insn (pat); - DONE; -}) - -(define_insn_and_split "*lshrsi3_nobs" +(define_insn_and_split "*si3_nobs" [(set (match_operand:SI 0 "dest_reg_operand") - (lshiftrt:SI (match_operand:SI 1 "register_operand") - (match_operand:SI 2 "nonmemory_operand")))] + (any_shift_rotate:SI (match_operand:SI 1 "register_operand") + (match_operand:SI 2 "nonmemory_operand")))] "!TARGET_BARREL_SHIFTER && operands[2] != const1_rtx && arc_pre_reload_split ()" @@ -3540,66 +3427,28 @@ archs4x, archs4xd" "&& 1" [(const_int 0)] { - if (CONST_INT_P (operands[2])) - { - int n = INTVAL (operands[2]) & 0x1f; - if (n <= 4) - { - if (n != 0) - { - emit_insn (gen_lshrsi3_cnt1 (operands[0], operands[1])); - while (--n > 0) - emit_insn (gen_lshrsi3_cnt1 (operands[0], operands[0])); - } - else - emit_move_insn (operands[0], operands[1]); - DONE; - } - } - - rtx pat; - rtx shift = gen_rtx_fmt_ee (LSHIFTRT, SImode, operands[1], operands[2]); - if (shiftr4_operator (shift, SImode)) - pat = gen_shift_si3 (operands[0], operands[1], operands[2], shift); - else - pat = gen_shift_si3_loop (operands[0], operands[1], operands[2], shift); - emit_insn (pat); + arc_split_ (operands); DONE; }) -;; shift_si3 appears after {ashr,lshr}si3_nobs -(define_insn "shift_si3" - [(set (match_operand:SI 0 "dest_reg_operand" "=r") - (match_operator:SI 3 "shiftr4_operator" - [(match_operand:SI 1 "register_operand" "0") - (match_operand:SI 2 "const_int_operand" "n")])) - (clobber (match_scratch:SI 4 "=&r")) - (clobber (reg:CC CC_REG)) - ] - "!TARGET_BARREL_SHIFTER - && operands[2] != const1_rtx" - "* return output_shift (operands);" - [(set_attr "type" "shift") - (set_attr "length" "16")]) - -;; shift_si3_loop appears after {ashl,ashr,lshr}si3_nobs -(define_insn "shift_si3_loop" +;; si3_loop appears after si3_nobs +(define_insn "si3_loop" [(set (match_operand:SI 0 "dest_reg_operand" "=r,r") - (match_operator:SI 3 "shift_operator" - [(match_operand:SI 1 "register_operand" "0,0") - (match_operand:SI 2 "nonmemory_operand" "rn,Cal")])) + (any_shift_rotate:SI + (match_operand:SI 1 "register_operand" "0,0") + (match_operand:SI 2 "nonmemory_operand" "rn,Cal"))) (clobber (reg:SI LP_COUNT)) (clobber (reg:CC CC_REG)) ] "!TARGET_BARREL_SHIFTER && operands[2] != const1_rtx" - "* return output_shift (operands);" + "* return output_shift_loop (, operands);" [(set_attr "type" "shift") (set_attr "length" "16,20")]) ;; Rotate instructions. -(define_insn "rotrsi3" +(define_insn "rotrsi3_insn" [(set (match_operand:SI 0 "dest_reg_operand" "=r, r, r") (rotatert:SI (match_operand:SI 1 "arc_nonmemory_operand" " 0,rL,rCsz") (match_operand:SI 2 "nonmemory_operand" "rL,rL,rCal")))] @@ -3609,6 +3458,35 @@ archs4x, archs4xd" (set_attr "predicable" "yes,no,no") (set_attr "length" "4,4,8")]) +(define_insn_and_split "*rotlsi3" + [(set (match_operand:SI 0 "dest_reg_operand") + (rotate:SI (match_operand:SI 1 "register_operand") + (match_operand:SI 2 "nonmemory_operand")))] + "TARGET_BARREL_SHIFTER + && arc_pre_reload_split ()" + "#" + "&& 1" + [(set (match_dup 0) (rotatert:SI (match_dup 1) (match_dup 3)))] +{ + if (CONST_INT_P (operands[2])) + { + int n = INTVAL (operands[2]) & 31; + if (n == 0) + { + emit_move_insn (operands[0], operands[1]); + DONE; + } + else operands[3] = GEN_INT (32 - n); + } + else + { + if (!register_operand (operands[2], SImode)) + operands[2] = force_reg (SImode, operands[2]); + operands[3] = gen_reg_rtx (SImode); + emit_insn (gen_negsi2 (operands[3], operands[2])); + } +}) + ;; Compare / branch instructions. (define_expand "cbranchsi4" @@ -5980,6 +5858,20 @@ archs4x, archs4xd" (zero_extract:SI (match_dup 1) (match_dup 5) (match_dup 7)))]) (match_dup 1)]) +(define_insn_and_split "rotlsi3_cnt1" + [(set (match_operand:SI 0 "dest_reg_operand" "=r") + (rotate:SI (match_operand:SI 1 "register_operand" "r") + (const_int 1)))] + "!TARGET_BARREL_SHIFTER" + "#" + "&& 1" + [(const_int 0)] +{ + emit_insn (gen_add_f (operands[0], operands[1], operands[1])); + emit_insn (gen_adc (operands[0], operands[0], const0_rtx)); + DONE; +}) + (define_insn "rotrsi3_cnt1" [(set (match_operand:SI 0 "dest_reg_operand" "=r") (rotatert:SI (match_operand:SI 1 "nonmemory_operand" "rL") diff --git a/gcc/config/arc/predicates.md b/gcc/config/arc/predicates.md index e37d884..0cde92e 100644 --- a/gcc/config/arc/predicates.md +++ b/gcc/config/arc/predicates.md @@ -549,15 +549,6 @@ (match_code "ashiftrt, lshiftrt, ashift") ) -;; Return true if OP is a right shift operator that can be implemented in -;; four insn words or less without a barrel shifter or multiplier. -(define_predicate "shiftr4_operator" - (and (match_code "ashiftrt, lshiftrt") - (match_test "const_int_operand (XEXP (op, 1), VOIDmode) ") - (match_test "UINTVAL (XEXP (op, 1)) <= 4U - || INTVAL (XEXP (op, 1)) == 30 - || INTVAL (XEXP (op, 1)) == 31"))) - (define_predicate "mult_operator" (and (match_code "mult") (match_test "TARGET_MPY")) )